diff --git a/2014/01/19/kaggle-beginner-tips/index.html b/2014/01/19/kaggle-beginner-tips/index.html
index 3e60da7f5..086020836 100644
--- a/2014/01/19/kaggle-beginner-tips/index.html
+++ b/2014/01/19/kaggle-beginner-tips/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Kaggle beginner tips | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data science,Kaggle,Kaggle beginners"><meta name=description content="First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Kaggle beginner tips"><meta property="og:description" content="First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-01-19T10:34:28+00:00"><meta property="article:modified_time" content="2023-07-06T09:28:02+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Kaggle beginner tips"><meta name=twitter:description content="First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Kaggle beginner tips","item":"https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Kaggle beginner tips","name":"Kaggle beginner tips","description":"First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions.","keywords":["data science","Kaggle","Kaggle beginners"],"articleBody":"These are few points from an email I sent to members of the Data Science Sydney Meetup. I suppose other Kaggle beginners may find it useful.\nMy first steps when working on a new competition are:\nRead all the instructions carefully to understand the problem. One important thing to look at is what measure is being optimised. For example, minimising the mean absolute error (MAE) may require a different approach from minimising the mean square error (MSE). Read messages on the forum. Especially when joining a competition late, you can learn a lot from the problems other people had. And sometimes there’s even code to get you started (though code quality may vary and it’s not worth relying on). Download the data and look at it a bit to understand it better, noting any insights you may have and things you would like to try. Even if you don’t know how to model something, knowing what you want to model is half of the solution. For example, in the DSG Hackathon (predicting air quality), we noticed that even though we had to produce hourly predictions for pollutant levels, the measured levels don’t change every hour (probably due to limitations in the measuring equipment). This led us to try a simple “model” for the first few hours, where we predicted exactly the last measured value, which proved to be one of our most valuable insights. Stupid and uninspiring, but we did finish 6th :-). The main message is: look at the data! Set up a local validation environment. This will allow you to iterate quickly without making submissions, and will increase the accuracy of your model. For those with some programming experience: local validation is your private development environment, the public leaderboard is staging, and the private leaderboard is production.\nWhat you use for local validation depends on the type of problem. For example, for classic prediction problems you may use one of the classic cross-validation techniques. For forecasting problems, you should try and have a local setup that is as close as possible to the setup of the leaderboard. In the Yandex competition, the leaderboard is based on data from the last three days of search activity. You should use a similar split for the training data (and of course, use exactly the same local setup for all the team members so you can compare results). Get the submission format right. Make sure that you can reproduce the baseline results locally. Now, the way things often work is:\nYou try many different approaches and ideas. Most of them lead to nothing. Hopefully some lead to something. Create ensembles of the various approaches. Repeat until you run out of time. Win. Hopefully. Note that in many competitions, the differences between the top results are not statistically significant, so winning may depend on luck. But getting one of the top results also depends to a large degree on your persistence. To avoid disappointment, I think the main goal should be to learn things, so spend time trying to understand how the methods that you’re using work. Libraries like sklearn make it really easy to try a bunch of models without understanding how they work, but you’re better off trying less things and developing the ability to reason about why they work or not work.\nAn analogy for programmers: while you can use an array, a linked list, a binary tree, and a hash table interchangeably in some situations, understanding when to use each one can make a world of difference in terms of performance. It’s pretty similar for predictive models (though they are often not as well-behaved as data structures).\nFinally, it’s worth watching this video by Phil Brierley, who won a bunch of Kaggle competitions. It’s really good, and doesn’t require much understanding of R.\nAny comments are welcome!\n","wordCount":"638","inLanguage":"en","datePublished":"2014-01-19T10:34:28Z","dateModified":"2023-07-06T09:28:02+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Kaggle beginner tips</h1><div class=post-meta><span title='2014-01-19 10:34:28 +0000 UTC'>January 19, 2014</span></div></header><div class=post-content><p>These are few points from an email I sent to members of the <a href=http://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>Data Science Sydney Meetup</a>. I suppose other Kaggle beginners may find it useful.</p><p>My first steps when working on a new competition are:</p><ul><li>Read all the instructions carefully to understand the problem. One important thing to look at is what measure is being optimised. For example, minimising the mean absolute error (MAE) may require a different approach from minimising the mean square error (MSE).</li><li>Read messages on the forum. Especially when joining a competition late, you can learn a lot from the problems other people had. And sometimes there&rsquo;s even code to get you started (though code quality may vary and it&rsquo;s not worth relying on).</li><li>Download the data and look at it a bit to understand it better, noting any insights you may have and things you would like to try. Even if you don&rsquo;t know <em>how</em> to model something, knowing <em>what</em> you want to model is half of the solution. For example, in the <a href=http://www.kaggle.com/c/dsg-hackathon target=_blank rel=noopener>DSG Hackathon</a> (predicting air quality), we noticed that even though we had to produce hourly predictions for pollutant levels, the measured levels don&rsquo;t change every hour (probably due to limitations in the measuring equipment). This led us to try a simple &ldquo;model&rdquo; for the first few hours, <a href=http://www.kaggle.com/c/dsg-hackathon/forums/t/1821/general-approaches-to-partitioning-the-models/10631#post10631 target=_blank rel=noopener>where we predicted exactly the last measured value</a>, which proved to be one of our most valuable insights. Stupid and uninspiring, but we did finish 6th :-). The main message is: look at the data!</li><li>Set up a local validation environment. This will allow you to iterate quickly without making submissions, and will increase the accuracy of your model. For those with some programming experience: local validation is your private development environment, the public leaderboard is staging, and the private leaderboard is production.<br>What you use for local validation depends on the type of problem. For example, for classic prediction problems you may use one of the classic <a href=https://en.wikipedia.org/wiki/Cross-validation_%28statistics%29 target=_blank rel=noopener>cross-validation techniques</a>. For forecasting problems, you should try and have a local setup that is as close as possible to the setup of the leaderboard. In the <a href=https://www.kaggle.com/c/yandex-personalized-web-search-challenge/>Yandex competition</a>, the leaderboard is based on data from the last three days of search activity. You should use a similar split for the training data (and of course, use exactly the same local setup for all the team members so you can compare results).</li><li>Get the submission format right. Make sure that you can reproduce the baseline results locally.</li></ul><p>Now, the way things often work is:</p><ul><li>You try many different approaches and ideas. Most of them lead to nothing. Hopefully some lead to something.</li><li>Create ensembles of the various approaches.</li><li>Repeat until you run out of time.</li><li>Win. Hopefully.</li></ul><p>Note that in many competitions, the differences between the top results are not statistically significant, so winning may depend on luck. But getting one of the top results also depends to a large degree on your persistence. To avoid disappointment, I think the main goal should be to learn things, so spend time trying to understand how the methods that you&rsquo;re using work. Libraries like sklearn make it really easy to try a bunch of models without understanding how they work, but you&rsquo;re better off trying less things and developing the ability to reason about why they work or not work.</p><p>An analogy for programmers: while you can use an array, a linked list, a binary tree, and a hash table interchangeably in some situations, understanding when to use each one can make a world of difference in terms of performance. It&rsquo;s pretty similar for predictive models (though they are often not as well-behaved as data structures).</p><p>Finally, it&rsquo;s worth watching <a href=http://anotherdataminingblog.blogspot.com.au/2013/10/techniques-to-improve-accuracy-of-your_17.html target=_blank rel=noopener>this video</a> by Phil Brierley, who won a bunch of Kaggle competitions. It&rsquo;s really good, and doesn&rsquo;t require much understanding of R.</p><p>Any comments are welcome!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/kaggle/>Kaggle</a></li><li><a href=https://yanirseroussi.com/tags/kaggle-beginners/>Kaggle Beginners</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle beginner tips on x" href="https://x.com/intent/tweet/?text=Kaggle%20beginner%20tips&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f01%2f19%2fkaggle-beginner-tips%2f&amp;hashtags=datascience%2cKaggle%2cKagglebeginners"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle beginner tips on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f01%2f19%2fkaggle-beginner-tips%2f&amp;title=Kaggle%20beginner%20tips&amp;summary=Kaggle%20beginner%20tips&amp;source=https%3a%2f%2fyanirseroussi.com%2f2014%2f01%2f19%2fkaggle-beginner-tips%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle beginner tips on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2014%2f01%2f19%2fkaggle-beginner-tips%2f&title=Kaggle%20beginner%20tips"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle beginner tips on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2014%2f01%2f19%2fkaggle-beginner-tips%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle beginner tips on whatsapp" href="https://api.whatsapp.com/send?text=Kaggle%20beginner%20tips%20-%20https%3a%2f%2fyanirseroussi.com%2f2014%2f01%2f19%2fkaggle-beginner-tips%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle beginner tips on telegram" href="https://telegram.me/share/url?text=Kaggle%20beginner%20tips&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f01%2f19%2fkaggle-beginner-tips%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle beginner tips on ycombinator" href="https://news.ycombinator.com/submitlink?t=Kaggle%20beginner%20tips&u=https%3a%2f%2fyanirseroussi.com%2f2014%2f01%2f19%2fkaggle-beginner-tips%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="data science,Kaggle,Kaggle beginners"><meta name=description content="First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Kaggle beginner tips"><meta property="og:description" content="First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-01-19T10:34:28+00:00"><meta property="article:modified_time" content="2023-07-06T09:28:02+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Kaggle beginner tips"><meta name=twitter:description content="First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Kaggle beginner tips","item":"https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Kaggle beginner tips","name":"Kaggle beginner tips","description":"First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions.","keywords":["data science","Kaggle","Kaggle beginners"],"articleBody":"These are few points from an email I sent to members of the Data Science Sydney Meetup. I suppose other Kaggle beginners may find it useful.\nMy first steps when working on a new competition are:\nRead all the instructions carefully to understand the problem. One important thing to look at is what measure is being optimised. For example, minimising the mean absolute error (MAE) may require a different approach from minimising the mean square error (MSE). Read messages on the forum. Especially when joining a competition late, you can learn a lot from the problems other people had. And sometimes there’s even code to get you started (though code quality may vary and it’s not worth relying on). Download the data and look at it a bit to understand it better, noting any insights you may have and things you would like to try. Even if you don’t know how to model something, knowing what you want to model is half of the solution. For example, in the DSG Hackathon (predicting air quality), we noticed that even though we had to produce hourly predictions for pollutant levels, the measured levels don’t change every hour (probably due to limitations in the measuring equipment). This led us to try a simple “model” for the first few hours, where we predicted exactly the last measured value, which proved to be one of our most valuable insights. Stupid and uninspiring, but we did finish 6th :-). The main message is: look at the data! Set up a local validation environment. This will allow you to iterate quickly without making submissions, and will increase the accuracy of your model. For those with some programming experience: local validation is your private development environment, the public leaderboard is staging, and the private leaderboard is production.\nWhat you use for local validation depends on the type of problem. For example, for classic prediction problems you may use one of the classic cross-validation techniques. For forecasting problems, you should try and have a local setup that is as close as possible to the setup of the leaderboard. In the Yandex competition, the leaderboard is based on data from the last three days of search activity. You should use a similar split for the training data (and of course, use exactly the same local setup for all the team members so you can compare results). Get the submission format right. Make sure that you can reproduce the baseline results locally. Now, the way things often work is:\nYou try many different approaches and ideas. Most of them lead to nothing. Hopefully some lead to something. Create ensembles of the various approaches. Repeat until you run out of time. Win. Hopefully. Note that in many competitions, the differences between the top results are not statistically significant, so winning may depend on luck. But getting one of the top results also depends to a large degree on your persistence. To avoid disappointment, I think the main goal should be to learn things, so spend time trying to understand how the methods that you’re using work. Libraries like sklearn make it really easy to try a bunch of models without understanding how they work, but you’re better off trying less things and developing the ability to reason about why they work or not work.\nAn analogy for programmers: while you can use an array, a linked list, a binary tree, and a hash table interchangeably in some situations, understanding when to use each one can make a world of difference in terms of performance. It’s pretty similar for predictive models (though they are often not as well-behaved as data structures).\nFinally, it’s worth watching this video by Phil Brierley, who won a bunch of Kaggle competitions. It’s really good, and doesn’t require much understanding of R.\nAny comments are welcome!\n","wordCount":"638","inLanguage":"en","datePublished":"2014-01-19T10:34:28Z","dateModified":"2023-07-06T09:28:02+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Kaggle beginner tips</h1><div class=post-meta><span title='2014-01-19 10:34:28 +0000 UTC'>January 19, 2014</span></div></header><div class=post-content><p>These are few points from an email I sent to members of the <a href=http://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>Data Science Sydney Meetup</a>. I suppose other Kaggle beginners may find it useful.</p><p>My first steps when working on a new competition are:</p><ul><li>Read all the instructions carefully to understand the problem. One important thing to look at is what measure is being optimised. For example, minimising the mean absolute error (MAE) may require a different approach from minimising the mean square error (MSE).</li><li>Read messages on the forum. Especially when joining a competition late, you can learn a lot from the problems other people had. And sometimes there&rsquo;s even code to get you started (though code quality may vary and it&rsquo;s not worth relying on).</li><li>Download the data and look at it a bit to understand it better, noting any insights you may have and things you would like to try. Even if you don&rsquo;t know <em>how</em> to model something, knowing <em>what</em> you want to model is half of the solution. For example, in the <a href=http://www.kaggle.com/c/dsg-hackathon target=_blank rel=noopener>DSG Hackathon</a> (predicting air quality), we noticed that even though we had to produce hourly predictions for pollutant levels, the measured levels don&rsquo;t change every hour (probably due to limitations in the measuring equipment). This led us to try a simple &ldquo;model&rdquo; for the first few hours, <a href=http://www.kaggle.com/c/dsg-hackathon/forums/t/1821/general-approaches-to-partitioning-the-models/10631#post10631 target=_blank rel=noopener>where we predicted exactly the last measured value</a>, which proved to be one of our most valuable insights. Stupid and uninspiring, but we did finish 6th :-). The main message is: look at the data!</li><li>Set up a local validation environment. This will allow you to iterate quickly without making submissions, and will increase the accuracy of your model. For those with some programming experience: local validation is your private development environment, the public leaderboard is staging, and the private leaderboard is production.<br>What you use for local validation depends on the type of problem. For example, for classic prediction problems you may use one of the classic <a href=https://en.wikipedia.org/wiki/Cross-validation_%28statistics%29 target=_blank rel=noopener>cross-validation techniques</a>. For forecasting problems, you should try and have a local setup that is as close as possible to the setup of the leaderboard. In the <a href=https://www.kaggle.com/c/yandex-personalized-web-search-challenge/>Yandex competition</a>, the leaderboard is based on data from the last three days of search activity. You should use a similar split for the training data (and of course, use exactly the same local setup for all the team members so you can compare results).</li><li>Get the submission format right. Make sure that you can reproduce the baseline results locally.</li></ul><p>Now, the way things often work is:</p><ul><li>You try many different approaches and ideas. Most of them lead to nothing. Hopefully some lead to something.</li><li>Create ensembles of the various approaches.</li><li>Repeat until you run out of time.</li><li>Win. Hopefully.</li></ul><p>Note that in many competitions, the differences between the top results are not statistically significant, so winning may depend on luck. But getting one of the top results also depends to a large degree on your persistence. To avoid disappointment, I think the main goal should be to learn things, so spend time trying to understand how the methods that you&rsquo;re using work. Libraries like sklearn make it really easy to try a bunch of models without understanding how they work, but you&rsquo;re better off trying less things and developing the ability to reason about why they work or not work.</p><p>An analogy for programmers: while you can use an array, a linked list, a binary tree, and a hash table interchangeably in some situations, understanding when to use each one can make a world of difference in terms of performance. It&rsquo;s pretty similar for predictive models (though they are often not as well-behaved as data structures).</p><p>Finally, it&rsquo;s worth watching <a href=http://anotherdataminingblog.blogspot.com.au/2013/10/techniques-to-improve-accuracy-of-your_17.html target=_blank rel=noopener>this video</a> by Phil Brierley, who won a bunch of Kaggle competitions. It&rsquo;s really good, and doesn&rsquo;t require much understanding of R.</p><p>Any comments are welcome!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/kaggle/>Kaggle</a></li><li><a href=https://yanirseroussi.com/tags/kaggle-beginners/>Kaggle Beginners</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle beginner tips on x" href="https://x.com/intent/tweet/?text=Kaggle%20beginner%20tips&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f01%2f19%2fkaggle-beginner-tips%2f&amp;hashtags=datascience%2cKaggle%2cKagglebeginners"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle beginner tips on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f01%2f19%2fkaggle-beginner-tips%2f&amp;title=Kaggle%20beginner%20tips&amp;summary=Kaggle%20beginner%20tips&amp;source=https%3a%2f%2fyanirseroussi.com%2f2014%2f01%2f19%2fkaggle-beginner-tips%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle beginner tips on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2014%2f01%2f19%2fkaggle-beginner-tips%2f&title=Kaggle%20beginner%20tips"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle beginner tips on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2014%2f01%2f19%2fkaggle-beginner-tips%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle beginner tips on whatsapp" href="https://api.whatsapp.com/send?text=Kaggle%20beginner%20tips%20-%20https%3a%2f%2fyanirseroussi.com%2f2014%2f01%2f19%2fkaggle-beginner-tips%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle beginner tips on telegram" href="https://telegram.me/share/url?text=Kaggle%20beginner%20tips&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f01%2f19%2fkaggle-beginner-tips%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle beginner tips on ycombinator" href="https://news.ycombinator.com/submitlink?t=Kaggle%20beginner%20tips&u=https%3a%2f%2fyanirseroussi.com%2f2014%2f01%2f19%2fkaggle-beginner-tips%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2014/08/17/datas-hierarchy-of-needs/index.html b/2014/08/17/datas-hierarchy-of-needs/index.html
index 67bd0ba2e..7b1bd4c2b 100644
--- a/2014/08/17/datas-hierarchy-of-needs/index.html
+++ b/2014/08/17/datas-hierarchy-of-needs/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Data’s hierarchy of needs | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,data business,data science"><meta name=description content="Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Data’s hierarchy of needs"><meta property="og:description" content="Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/"><meta property="og:image" content="https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-08-17T13:09:30+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs.jpg"><meta name=twitter:title content="Data’s hierarchy of needs"><meta name=twitter:description content="Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Data’s hierarchy of needs","item":"https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Data’s hierarchy of needs","name":"Data’s hierarchy of needs","description":"Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms \u0026amp; insights can only be as good as the underlying data.","keywords":["business","data business","data science"],"articleBody":"One of my favourite blog posts in recent times is The Log: What every software engineer should know about real-time data’s unifying abstraction by Jay Kreps. That post comprehensively describes how abstracting all the data produced by LinkedIn’s various components into a single log pipeline greatly simplified their architecture and enabled advanced data-driven applications. Among the various technical details there are some beautifully-articulated business insights. My favourite one defines data’s hierarchy of needs:\nEffective use of data follows a kind of Maslow’s hierarchy of needs. The base of the pyramid involves capturing all the relevant data, being able to put it together in an applicable processing environment (be that a fancy real-time query system or just text files and python scripts). This data needs to be modeled in a uniform way to make it easy to read and process. Once these basic needs of capturing data in a uniform way are taken care of it is reasonable to work on infrastructure to process this data in various ways—MapReduce, real-time query systems, etc.\nIt’s worth noting the obvious: without a reliable and complete data flow, a Hadoop cluster is little more than a very expensive and difficult to assemble space heater. Once data and processing are available, one can move concern on to more refined problems of good data models and consistent well understood semantics. Finally, concentration can shift to more sophisticated processing—better visualization, reporting, and algorithmic processing and prediction.\nIn my experience, most organizations have huge holes in the base of this pyramid—they lack reliable complete data flow—but want to jump directly to advanced data modeling techniques. This is completely backwards. [emphasis mine]\nVisually, it looks something like this:\nIn addition, before starting to build a data pipeline, one needs to ensure that the tracked system works as expected. For example, a buggy website is likely to produce weird metrics, which in turn would make the data processing, reporting and predictions unreliable. I completely agree with Jay’s point about needing to get the basis of the pyramid right before setting out to do “something with data” (which seems to be the desire of every company nowadays).\nThe general point is that it’s important to have realistic expectations about what can be obtained by data-driven algorithms and insights. These can only be as good as the underlying data, with the results always depending to a large degree on having a solid infrastructure. Not everything has to be perfect from the start (most things never will be), but some degree of robustness is required to avoid spending too many resources on things that would never work. Trying to apply the latest predictive models without a reliable data infrastructure is like driving a fancy car on broken roads – you’re unlikely to get very far.\n","wordCount":"462","inLanguage":"en","image":"https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs.jpg","datePublished":"2014-08-17T13:09:30Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Data’s hierarchy of needs</h1><div class=post-meta><span title='2014-08-17 13:09:30 +0000 UTC'>August 17, 2014</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs_huf4ee54c31df2e5d065ff2a11dd219264_126270_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs_huf4ee54c31df2e5d065ff2a11dd219264_126270_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs_huf4ee54c31df2e5d065ff2a11dd219264_126270_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs.jpg 794w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs.jpg alt width=794 height=582></figure><div class=post-content><p>One of my favourite blog posts in recent times is <a href=http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying>The Log: What every software engineer should know about real-time data&rsquo;s unifying abstraction</a> by Jay Kreps. That post comprehensively describes how abstracting all the data produced by LinkedIn&rsquo;s various components into a single log pipeline greatly simplified their architecture and enabled advanced data-driven applications. Among the various technical details there are some beautifully-articulated business insights. My favourite one defines data&rsquo;s hierarchy of needs:</p><blockquote><p>Effective use of data follows a kind of <a href=http://en.wikipedia.org/wiki/Maslow%27s_hierarchy_of_needs target=_blank rel=noopener>Maslow&rsquo;s hierarchy of needs</a>. The base of the pyramid involves capturing all the relevant data, being able to put it together in an applicable processing environment (be that a fancy real-time query system or just text files and python scripts). This data needs to be modeled in a uniform way to make it easy to read and process. Once these basic needs of capturing data in a uniform way are taken care of it is reasonable to work on infrastructure to process this data in various ways—MapReduce, real-time query systems, etc.</p><p>It&rsquo;s worth noting the obvious: without a reliable and complete data flow, a Hadoop cluster is little more than a very expensive and difficult to assemble space heater. Once data and processing are available, one can move concern on to more refined problems of good data models and consistent well understood semantics. Finally, concentration can shift to more sophisticated processing—better visualization, reporting, and algorithmic processing and prediction.</p><p><strong>In my experience, most organizations have huge holes in the base of this pyramid—they lack reliable complete data flow—but want to jump directly to advanced data modeling techniques. This is completely backwards.</strong> [emphasis mine]</p></blockquote><p>Visually, it looks something like this:</p><figure><a href=datas-hierarchy-of-needs.jpg target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="business,data business,data science"><meta name=description content="Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Data’s hierarchy of needs"><meta property="og:description" content="Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/"><meta property="og:image" content="https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-08-17T13:09:30+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs.jpg"><meta name=twitter:title content="Data’s hierarchy of needs"><meta name=twitter:description content="Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Data’s hierarchy of needs","item":"https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Data’s hierarchy of needs","name":"Data’s hierarchy of needs","description":"Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms \u0026amp; insights can only be as good as the underlying data.","keywords":["business","data business","data science"],"articleBody":"One of my favourite blog posts in recent times is The Log: What every software engineer should know about real-time data’s unifying abstraction by Jay Kreps. That post comprehensively describes how abstracting all the data produced by LinkedIn’s various components into a single log pipeline greatly simplified their architecture and enabled advanced data-driven applications. Among the various technical details there are some beautifully-articulated business insights. My favourite one defines data’s hierarchy of needs:\nEffective use of data follows a kind of Maslow’s hierarchy of needs. The base of the pyramid involves capturing all the relevant data, being able to put it together in an applicable processing environment (be that a fancy real-time query system or just text files and python scripts). This data needs to be modeled in a uniform way to make it easy to read and process. Once these basic needs of capturing data in a uniform way are taken care of it is reasonable to work on infrastructure to process this data in various ways—MapReduce, real-time query systems, etc.\nIt’s worth noting the obvious: without a reliable and complete data flow, a Hadoop cluster is little more than a very expensive and difficult to assemble space heater. Once data and processing are available, one can move concern on to more refined problems of good data models and consistent well understood semantics. Finally, concentration can shift to more sophisticated processing—better visualization, reporting, and algorithmic processing and prediction.\nIn my experience, most organizations have huge holes in the base of this pyramid—they lack reliable complete data flow—but want to jump directly to advanced data modeling techniques. This is completely backwards. [emphasis mine]\nVisually, it looks something like this:\nIn addition, before starting to build a data pipeline, one needs to ensure that the tracked system works as expected. For example, a buggy website is likely to produce weird metrics, which in turn would make the data processing, reporting and predictions unreliable. I completely agree with Jay’s point about needing to get the basis of the pyramid right before setting out to do “something with data” (which seems to be the desire of every company nowadays).\nThe general point is that it’s important to have realistic expectations about what can be obtained by data-driven algorithms and insights. These can only be as good as the underlying data, with the results always depending to a large degree on having a solid infrastructure. Not everything has to be perfect from the start (most things never will be), but some degree of robustness is required to avoid spending too many resources on things that would never work. Trying to apply the latest predictive models without a reliable data infrastructure is like driving a fancy car on broken roads – you’re unlikely to get very far.\n","wordCount":"462","inLanguage":"en","image":"https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs.jpg","datePublished":"2014-08-17T13:09:30Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Data’s hierarchy of needs</h1><div class=post-meta><span title='2014-08-17 13:09:30 +0000 UTC'>August 17, 2014</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs_huf4ee54c31df2e5d065ff2a11dd219264_126270_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs_huf4ee54c31df2e5d065ff2a11dd219264_126270_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs_huf4ee54c31df2e5d065ff2a11dd219264_126270_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs.jpg 794w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs.jpg alt width=794 height=582></figure><div class=post-content><p>One of my favourite blog posts in recent times is <a href=http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying>The Log: What every software engineer should know about real-time data&rsquo;s unifying abstraction</a> by Jay Kreps. That post comprehensively describes how abstracting all the data produced by LinkedIn&rsquo;s various components into a single log pipeline greatly simplified their architecture and enabled advanced data-driven applications. Among the various technical details there are some beautifully-articulated business insights. My favourite one defines data&rsquo;s hierarchy of needs:</p><blockquote><p>Effective use of data follows a kind of <a href=http://en.wikipedia.org/wiki/Maslow%27s_hierarchy_of_needs target=_blank rel=noopener>Maslow&rsquo;s hierarchy of needs</a>. The base of the pyramid involves capturing all the relevant data, being able to put it together in an applicable processing environment (be that a fancy real-time query system or just text files and python scripts). This data needs to be modeled in a uniform way to make it easy to read and process. Once these basic needs of capturing data in a uniform way are taken care of it is reasonable to work on infrastructure to process this data in various ways—MapReduce, real-time query systems, etc.</p><p>It&rsquo;s worth noting the obvious: without a reliable and complete data flow, a Hadoop cluster is little more than a very expensive and difficult to assemble space heater. Once data and processing are available, one can move concern on to more refined problems of good data models and consistent well understood semantics. Finally, concentration can shift to more sophisticated processing—better visualization, reporting, and algorithmic processing and prediction.</p><p><strong>In my experience, most organizations have huge holes in the base of this pyramid—they lack reliable complete data flow—but want to jump directly to advanced data modeling techniques. This is completely backwards.</strong> [emphasis mine]</p></blockquote><p>Visually, it looks something like this:</p><figure><a href=datas-hierarchy-of-needs.jpg target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs_huf4ee54c31df2e5d065ff2a11dd219264_126270_360x0_resize_q75_box.jpg 360w,
 https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs_huf4ee54c31df2e5d065ff2a11dd219264_126270_480x0_resize_q75_box.jpg 480w,
 https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/datas-hierarchy-of-needs_huf4ee54c31df2e5d065ff2a11dd219264_126270_720x0_resize_q75_box.jpg 720w,
diff --git a/2014/08/24/how-to-almost-win-kaggle-competitions/index.html b/2014/08/24/how-to-almost-win-kaggle-competitions/index.html
index 758d27a57..659844f89 100644
--- a/2014/08/24/how-to-almost-win-kaggle-competitions/index.html
+++ b/2014/08/24/how-to-almost-win-kaggle-competitions/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>How to (almost) win Kaggle competitions | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data science,Kaggle,Kaggle beginners,Kaggle competition,predictive modelling"><meta name=description content="Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="How to (almost) win Kaggle competitions"><meta property="og:description" content="Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-08-24T12:40:53+00:00"><meta property="article:modified_time" content="2023-07-06T09:28:02+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="How to (almost) win Kaggle competitions"><meta name=twitter:description content="Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"How to (almost) win Kaggle competitions","item":"https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"How to (almost) win Kaggle competitions","name":"How to (almost) win Kaggle competitions","description":"Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.","keywords":["data science","Kaggle","Kaggle beginners","Kaggle competition","predictive modelling"],"articleBody":"Last week, I gave a talk at the Data Science Sydney Meetup group about some of the lessons I learned through almost winning five Kaggle competitions. The core of the talk was ten tips, which I think are worth putting in a post (the original slides are here). Some of these tips were covered in my beginner tips post from a few months ago. Similar advice was also recently published on the Kaggle blog – it’s great to see that my tips are in line with the thoughts of other prolific kagglers.\nTip 1: RTFM It’s surprising to see how many people miss out on important details, such as remembering the final date to make the first submission. Before jumping into building models, it’s important to understand the competition timeline, be able to reproduce benchmarks, generate the correct submission format, etc.\nTip 2: Know your measure A key part of doing well in a competition is understanding how the measure works. It’s often easy to obtain significant improvements in your score by using an optimisation approach that is suitable to the measure. A classic example is optimising the mean absolute error (MAE) versus the mean square error (MSE). It’s easy to show that given no other data for a set of numbers, the predictor that minimises the MAE is the median, while the predictor that minimises the MSE is the mean. Indeed, in the EMC Data Science Hackathon we fell back to the median rather than the mean when there wasn’t enough data, and that ended up working pretty well.\nTip 3: Know your data In Kaggle competitions, overspecialisation (without overfitting) is a good thing. This is unlike academic machine learning papers, where researchers often test their proposed method on many different datasets. This is also unlike more applied work, where you may care about data drifting and whether what you predict actually makes sense. Examples include the Hackathon, where the measures of pollutants in the air were repeated for consecutive hours (i.e., they weren’t really measured); the multi-label Greek article competition, where I found connected components of labels (doesn’t generalise well to other datasets); and the Arabic writers competition, where I used histogram kernels to deal with the features that we were given. The general lesson is that custom solutions win, and that’s why the world needs data scientists (at least until we are replaced by robots).\nTip 4: What before how It’s important to know what you want to model before figuring out how to model it. It seems like many beginners tend to worry too much about which tool to use (Python or R? Logistic regression or SVMs?), when they should be worrying about understanding the data and what useful patterns they want to capture. For example, when we worked on the Yandex search personalisation competition, we spent a lot of time looking at the data and thinking what makes sense for users to be doing. In that case it was easy to come up with ideas, because we all use search engines. But the main message is that to be effective, you have to become one with the data.\nTip 5: Do local validation This is a point I covered in my Kaggle beginner tips post. Having a local validation environment allows you to move faster and produce more reliable results than when relying on the leaderboard. The main scenarios when you should skip local validation is when the data is too small (a problem I had in the Arabic writers competition), or when you run out of time (towards the end of the competition).\nTip 6: Make fewer submissions In addition to making you look good, making few submissions reduces the likelihood of overfitting the leaderboard, which is a real problem. If your local validation is set up well and is consistent with the leaderboard (which you need to test by making one or two submissions), there’s really no need to make many submissions. Further, if you’re doing well, making submissions erodes your competitive advantage by showing your competitors what scores are obtainable and motivating them to work harder. Just resist the urge to submit, unless you have a really good reason.\nTip 7: Do your research For any given problem, it’s likely that there are people dedicating their lives to its solution. These people (often academics) have probably published papers, benchmarks and code, which you can learn from. Unlike actually winning, which is not only dependent on you, gaining deeper knowledge and understanding is the only sure reward of a competition. This has worked well for me, as I’ve learned something new and applied it successfully in nearly every competition I’ve worked on.\nTip 8: Apply the basics rigorously While playing with obscure methods can be a lot of fun, it’s often the case that the basics will get you very far. Common algorithms have good implementations in most major languages, so there’s really no reason not to try them. However, note that when you do try any methods, you must do some minimal tuning of the main parameters (e.g., number of trees in a random forest or the regularisation of a linear model). Running a method without minimal tuning is worse than not running it at all, because you may get a false negative – giving up on a method that actually works very well.\nAn example of applying the basics rigorously is in the classic paper In defense of one-vs-all classification, where the authors showed that the simple one-vs-all (OVA) approach to multiclass classification is at least as good as approaches that are much more sophisticated. In their words: “What we find is that although a wide array of more sophisticated methods for multiclass classification exist, experimental evidence of the superiority of these methods over a simple OVA scheme is either lacking or improperly controlled or measured”. If such a failure to perform proper experiments can happen to serious machine learning researchers, it can definitely happen to the average kaggler. Don’t let it happen to you.\nTip 9: The forum is your friend It’s very important to subscribe to the forum to receive notifications on issues with the data or the competition. In addition, it’s worth trying to figure out what your competitors are doing. An extreme example is the recent trend of code sharing during the competition (which I don’t really like) – while it’s not a good idea to rely on such code, it’s important to be aware of its existence. Finally, reading the post-competition summaries on the forum is a valuable way of learning from the winners and improving over time.\nTip 10: Ensemble all the things Not to be confused with ensemble methods (which are also very important), the idea here is to combine models that were developed independently. In high-profile competitions, it is often the case that teams merge and gain a significant boost from combining their models. This is worth doing even when competing alone, because almost no competition is won by a single model.\n","wordCount":"1169","inLanguage":"en","datePublished":"2014-08-24T12:40:53Z","dateModified":"2023-07-06T09:28:02+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">How to (almost) win Kaggle competitions</h1><div class=post-meta><span title='2014-08-24 12:40:53 +0000 UTC'>August 24, 2014</span></div></header><div class=post-content><p>Last week, I gave a talk at the <a href=http://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>Data Science Sydney Meetup group</a> about some of the lessons I learned through almost winning five Kaggle competitions. The core of the talk was ten tips, which I think are worth putting in a post (the original slides are <a href=http://yanirs.github.io/talks/data-science-sydney-winning-kaggle/ target=_blank rel=noopener>here</a>). Some of these tips were covered in my <a href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/>beginner tips post</a> from a few months ago. Similar advice was also <a href=http://blog.kaggle.com/2014/08/01/learning-from-the-best/ target=_blank rel=noopener>recently published on the Kaggle blog</a> – it&rsquo;s great to see that my tips are in line with the thoughts of other prolific kagglers.</p><h3 id=tip-1-rtfm>Tip 1: RTFM<a hidden class=anchor aria-hidden=true href=#tip-1-rtfm>#</a></h3><p>It&rsquo;s surprising to see how many people miss out on important details, such as remembering the final date to make the first submission. Before jumping into building models, it&rsquo;s important to understand the competition timeline, be able to reproduce benchmarks, generate the correct submission format, etc.</p><h3 id=tip-2-know-your-measure>Tip 2: Know your measure<a hidden class=anchor aria-hidden=true href=#tip-2-know-your-measure>#</a></h3><p>A key part of doing well in a competition is understanding how the measure works. It&rsquo;s often easy to obtain significant improvements in your score by using an optimisation approach that is suitable to the measure. A classic example is optimising the mean absolute error (MAE) versus the mean square error (MSE). It&rsquo;s easy to show that given no other data for a set of numbers, the predictor that minimises the MAE is the median, while the predictor that minimises the MSE is the mean. Indeed, in the <a href=https://www.kaggle.com/c/dsg-hackathon/forums/t/1821/general-approaches-to-partitioning-the-models/10631#post10631 target=_blank rel=noopener>EMC Data Science Hackathon</a> we fell back to the median rather than the mean when there wasn&rsquo;t enough data, and that ended up working pretty well.</p><h3 id=tip-3-know-your-data>Tip 3: Know your data<a hidden class=anchor aria-hidden=true href=#tip-3-know-your-data>#</a></h3><p>In Kaggle competitions, overspecialisation (without overfitting) is a good thing. This is unlike academic machine learning papers, where researchers often test their proposed method on many different datasets. This is also unlike more applied work, where you may care about data drifting and whether what you predict actually makes sense. Examples include the <a href=https://www.kaggle.com/c/dsg-hackathon/forums/t/1821/general-approaches-to-partitioning-the-models/10631#post10631 target=_blank rel=noopener>Hackathon</a>, where the measures of pollutants in the air were repeated for consecutive hours (i.e., they weren&rsquo;t really measured); the <a title="Greek Media Monitoring Kaggle competition: My approach" href=https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/ target=_blank rel=noopener>multi-label Greek article competition</a>, where I found connected components of labels (doesn&rsquo;t generalise well to other datasets); and the <a href=http://blog.kaggle.com/2012/04/29/on-diffusion-kernels-histograms-and-arabic-writer-identification/ target=_blank rel=noopener>Arabic writers competition</a>, where I used histogram kernels to deal with the features that we were given. The general lesson is that custom solutions win, and that&rsquo;s why the world needs data scientists (at least <a href=http://www.datarobot.com/ target=_blank rel=noopener>until we are replaced by robots</a>).</p><h3 id=tip-4-what-before-how>Tip 4: What before how<a hidden class=anchor aria-hidden=true href=#tip-4-what-before-how>#</a></h3><p>It&rsquo;s important to know <em>what</em> you want to model before figuring out <em>how</em> to model it. It seems like many beginners tend to worry too much about which tool to use (Python or R? Logistic regression or SVMs?), when they should be worrying about understanding the data and what useful patterns they want to capture. For example, when we worked on the <a href=https://www.kaggle.com/c/yandex-personalized-web-search-challenge/forums/t/6811/share-your-approach/37306#post37306 target=_blank rel=noopener>Yandex search personalisation competition</a>, we spent a lot of time looking at the data and thinking what makes sense for users to be doing. In that case it was easy to come up with ideas, because we all use search engines. But the main message is that to be effective, you have to become one with the data.</p><h3 id=tip-5-do-local-validation>Tip 5: Do local validation<a hidden class=anchor aria-hidden=true href=#tip-5-do-local-validation>#</a></h3><p>This is a point I covered in my <a href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/#validation>Kaggle beginner tips post</a>. Having a local validation environment allows you to move faster and produce more reliable results than when relying on the leaderboard. The main scenarios when you should skip local validation is when the data is too small (a problem I had in the <a href=http://blog.kaggle.com/2012/04/29/on-diffusion-kernels-histograms-and-arabic-writer-identification/ target=_blank rel=noopener>Arabic writers competition</a>), or when you run out of time (towards the end of the competition).</p><h3 id=tip-6-make-fewer-submissions>Tip 6: Make fewer submissions<a hidden class=anchor aria-hidden=true href=#tip-6-make-fewer-submissions>#</a></h3><p>In addition to making you look good, making few submissions reduces the likelihood of overfitting the leaderboard, which is a real problem. If your local validation is set up well and is consistent with the leaderboard (which you need to test by making one or two submissions), there&rsquo;s really no need to make many submissions. Further, if you&rsquo;re doing well, making submissions erodes your competitive advantage by showing your competitors what scores are obtainable and motivating them to work harder. Just resist the urge to submit, unless you have a really good reason.</p><h3 id=tip-7-do-your-research>Tip 7: Do your research<a hidden class=anchor aria-hidden=true href=#tip-7-do-your-research>#</a></h3><p>For any given problem, it&rsquo;s likely that there are people dedicating their lives to its solution. These people (often academics) have probably published papers, benchmarks and code, which you can learn from. Unlike actually winning, which is not only dependent on you, gaining deeper knowledge and understanding is the only sure reward of a competition. This has worked well for me, as I&rsquo;ve learned something new and applied it successfully in <a href=https://yanirseroussi.com/2014/04/05/kaggle-competition-summaries/>nearly every competition I&rsquo;ve worked on</a>.</p><h3 id=tip-8-apply-the-basics-rigorously>Tip 8: Apply the basics rigorously<a hidden class=anchor aria-hidden=true href=#tip-8-apply-the-basics-rigorously>#</a></h3><p>While playing with obscure methods can be a lot of fun, it&rsquo;s often the case that the basics will get you very far. Common algorithms have good implementations in most major languages, so there&rsquo;s really no reason not to try them. However, note that when you do try any methods, you <em>must</em> do some minimal tuning of the main parameters (e.g., number of trees in a random forest or the regularisation of a linear model). <strong>Running a method without minimal tuning is worse than not running it at all</strong>, because you may get a false negative – giving up on a method that actually works very well.</p><p>An example of applying the basics rigorously is in the classic paper <a href=http://jmlr.org/papers/volume5/rifkin04a/rifkin04a.pdf target=_blank rel=noopener>In defense of one-vs-all classification</a>, where the authors showed that the simple one-vs-all (OVA) approach to multiclass classification is at least as good as approaches that are much more sophisticated. In their words: &ldquo;What we find is that although a wide array of more sophisticated methods for multiclass classification exist, experimental evidence of the superiority of these methods over a simple OVA scheme is either lacking or improperly controlled or measured&rdquo;. If such a failure to perform proper experiments can happen to serious machine learning researchers, it can definitely happen to the average kaggler. Don&rsquo;t let it happen to you.</p><h3 id=tip-9-the-forum-is-your-friend>Tip 9: The forum is your friend<a hidden class=anchor aria-hidden=true href=#tip-9-the-forum-is-your-friend>#</a></h3><p>It&rsquo;s very important to subscribe to the forum to receive notifications on issues with the data or the competition. In addition, it&rsquo;s worth trying to figure out what your competitors are doing. An extreme example is the recent trend of code sharing during the competition (<a href=http://www.kaggle.com/forums/t/5681/fed-up-with-beating-benchmark-code/30787#post30787 target=_blank rel=noopener>which I don&rsquo;t really like</a>) – while it&rsquo;s not a good idea to rely on such code, it&rsquo;s important to be aware of its existence. Finally, reading the post-competition summaries on the forum is a valuable way of learning from the winners and improving over time.</p><h3 id=tip-10-ensemble-all-the-things>Tip 10: Ensemble all the things<a hidden class=anchor aria-hidden=true href=#tip-10-ensemble-all-the-things>#</a></h3><p>Not to be confused with ensemble methods (which are also very important), the idea here is to combine models that were developed independently. In high-profile competitions, it is often the case that teams merge and gain a significant boost from combining their models. This is worth doing even when competing alone, because almost no competition is won by a single model.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/kaggle/>Kaggle</a></li><li><a href=https://yanirseroussi.com/tags/kaggle-beginners/>Kaggle Beginners</a></li><li><a href=https://yanirseroussi.com/tags/kaggle-competition/>Kaggle Competition</a></li><li><a href=https://yanirseroussi.com/tags/predictive-modelling/>Predictive Modelling</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share How to (almost) win Kaggle competitions on x" href="https://x.com/intent/tweet/?text=How%20to%20%28almost%29%20win%20Kaggle%20competitions&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f24%2fhow-to-almost-win-kaggle-competitions%2f&amp;hashtags=datascience%2cKaggle%2cKagglebeginners%2cKagglecompetition%2cpredictivemodelling"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to (almost) win Kaggle competitions on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f24%2fhow-to-almost-win-kaggle-competitions%2f&amp;title=How%20to%20%28almost%29%20win%20Kaggle%20competitions&amp;summary=How%20to%20%28almost%29%20win%20Kaggle%20competitions&amp;source=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f24%2fhow-to-almost-win-kaggle-competitions%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to (almost) win Kaggle competitions on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f24%2fhow-to-almost-win-kaggle-competitions%2f&title=How%20to%20%28almost%29%20win%20Kaggle%20competitions"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to (almost) win Kaggle competitions on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f24%2fhow-to-almost-win-kaggle-competitions%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to (almost) win Kaggle competitions on whatsapp" href="https://api.whatsapp.com/send?text=How%20to%20%28almost%29%20win%20Kaggle%20competitions%20-%20https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f24%2fhow-to-almost-win-kaggle-competitions%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to (almost) win Kaggle competitions on telegram" href="https://telegram.me/share/url?text=How%20to%20%28almost%29%20win%20Kaggle%20competitions&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f24%2fhow-to-almost-win-kaggle-competitions%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to (almost) win Kaggle competitions on ycombinator" href="https://news.ycombinator.com/submitlink?t=How%20to%20%28almost%29%20win%20Kaggle%20competitions&u=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f24%2fhow-to-almost-win-kaggle-competitions%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="data science,Kaggle,Kaggle beginners,Kaggle competition,predictive modelling"><meta name=description content="Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="How to (almost) win Kaggle competitions"><meta property="og:description" content="Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-08-24T12:40:53+00:00"><meta property="article:modified_time" content="2023-07-06T09:28:02+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="How to (almost) win Kaggle competitions"><meta name=twitter:description content="Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"How to (almost) win Kaggle competitions","item":"https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"How to (almost) win Kaggle competitions","name":"How to (almost) win Kaggle competitions","description":"Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.","keywords":["data science","Kaggle","Kaggle beginners","Kaggle competition","predictive modelling"],"articleBody":"Last week, I gave a talk at the Data Science Sydney Meetup group about some of the lessons I learned through almost winning five Kaggle competitions. The core of the talk was ten tips, which I think are worth putting in a post (the original slides are here). Some of these tips were covered in my beginner tips post from a few months ago. Similar advice was also recently published on the Kaggle blog – it’s great to see that my tips are in line with the thoughts of other prolific kagglers.\nTip 1: RTFM It’s surprising to see how many people miss out on important details, such as remembering the final date to make the first submission. Before jumping into building models, it’s important to understand the competition timeline, be able to reproduce benchmarks, generate the correct submission format, etc.\nTip 2: Know your measure A key part of doing well in a competition is understanding how the measure works. It’s often easy to obtain significant improvements in your score by using an optimisation approach that is suitable to the measure. A classic example is optimising the mean absolute error (MAE) versus the mean square error (MSE). It’s easy to show that given no other data for a set of numbers, the predictor that minimises the MAE is the median, while the predictor that minimises the MSE is the mean. Indeed, in the EMC Data Science Hackathon we fell back to the median rather than the mean when there wasn’t enough data, and that ended up working pretty well.\nTip 3: Know your data In Kaggle competitions, overspecialisation (without overfitting) is a good thing. This is unlike academic machine learning papers, where researchers often test their proposed method on many different datasets. This is also unlike more applied work, where you may care about data drifting and whether what you predict actually makes sense. Examples include the Hackathon, where the measures of pollutants in the air were repeated for consecutive hours (i.e., they weren’t really measured); the multi-label Greek article competition, where I found connected components of labels (doesn’t generalise well to other datasets); and the Arabic writers competition, where I used histogram kernels to deal with the features that we were given. The general lesson is that custom solutions win, and that’s why the world needs data scientists (at least until we are replaced by robots).\nTip 4: What before how It’s important to know what you want to model before figuring out how to model it. It seems like many beginners tend to worry too much about which tool to use (Python or R? Logistic regression or SVMs?), when they should be worrying about understanding the data and what useful patterns they want to capture. For example, when we worked on the Yandex search personalisation competition, we spent a lot of time looking at the data and thinking what makes sense for users to be doing. In that case it was easy to come up with ideas, because we all use search engines. But the main message is that to be effective, you have to become one with the data.\nTip 5: Do local validation This is a point I covered in my Kaggle beginner tips post. Having a local validation environment allows you to move faster and produce more reliable results than when relying on the leaderboard. The main scenarios when you should skip local validation is when the data is too small (a problem I had in the Arabic writers competition), or when you run out of time (towards the end of the competition).\nTip 6: Make fewer submissions In addition to making you look good, making few submissions reduces the likelihood of overfitting the leaderboard, which is a real problem. If your local validation is set up well and is consistent with the leaderboard (which you need to test by making one or two submissions), there’s really no need to make many submissions. Further, if you’re doing well, making submissions erodes your competitive advantage by showing your competitors what scores are obtainable and motivating them to work harder. Just resist the urge to submit, unless you have a really good reason.\nTip 7: Do your research For any given problem, it’s likely that there are people dedicating their lives to its solution. These people (often academics) have probably published papers, benchmarks and code, which you can learn from. Unlike actually winning, which is not only dependent on you, gaining deeper knowledge and understanding is the only sure reward of a competition. This has worked well for me, as I’ve learned something new and applied it successfully in nearly every competition I’ve worked on.\nTip 8: Apply the basics rigorously While playing with obscure methods can be a lot of fun, it’s often the case that the basics will get you very far. Common algorithms have good implementations in most major languages, so there’s really no reason not to try them. However, note that when you do try any methods, you must do some minimal tuning of the main parameters (e.g., number of trees in a random forest or the regularisation of a linear model). Running a method without minimal tuning is worse than not running it at all, because you may get a false negative – giving up on a method that actually works very well.\nAn example of applying the basics rigorously is in the classic paper In defense of one-vs-all classification, where the authors showed that the simple one-vs-all (OVA) approach to multiclass classification is at least as good as approaches that are much more sophisticated. In their words: “What we find is that although a wide array of more sophisticated methods for multiclass classification exist, experimental evidence of the superiority of these methods over a simple OVA scheme is either lacking or improperly controlled or measured”. If such a failure to perform proper experiments can happen to serious machine learning researchers, it can definitely happen to the average kaggler. Don’t let it happen to you.\nTip 9: The forum is your friend It’s very important to subscribe to the forum to receive notifications on issues with the data or the competition. In addition, it’s worth trying to figure out what your competitors are doing. An extreme example is the recent trend of code sharing during the competition (which I don’t really like) – while it’s not a good idea to rely on such code, it’s important to be aware of its existence. Finally, reading the post-competition summaries on the forum is a valuable way of learning from the winners and improving over time.\nTip 10: Ensemble all the things Not to be confused with ensemble methods (which are also very important), the idea here is to combine models that were developed independently. In high-profile competitions, it is often the case that teams merge and gain a significant boost from combining their models. This is worth doing even when competing alone, because almost no competition is won by a single model.\n","wordCount":"1169","inLanguage":"en","datePublished":"2014-08-24T12:40:53Z","dateModified":"2023-07-06T09:28:02+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">How to (almost) win Kaggle competitions</h1><div class=post-meta><span title='2014-08-24 12:40:53 +0000 UTC'>August 24, 2014</span></div></header><div class=post-content><p>Last week, I gave a talk at the <a href=http://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>Data Science Sydney Meetup group</a> about some of the lessons I learned through almost winning five Kaggle competitions. The core of the talk was ten tips, which I think are worth putting in a post (the original slides are <a href=http://yanirs.github.io/talks/data-science-sydney-winning-kaggle/ target=_blank rel=noopener>here</a>). Some of these tips were covered in my <a href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/>beginner tips post</a> from a few months ago. Similar advice was also <a href=http://blog.kaggle.com/2014/08/01/learning-from-the-best/ target=_blank rel=noopener>recently published on the Kaggle blog</a> – it&rsquo;s great to see that my tips are in line with the thoughts of other prolific kagglers.</p><h3 id=tip-1-rtfm>Tip 1: RTFM<a hidden class=anchor aria-hidden=true href=#tip-1-rtfm>#</a></h3><p>It&rsquo;s surprising to see how many people miss out on important details, such as remembering the final date to make the first submission. Before jumping into building models, it&rsquo;s important to understand the competition timeline, be able to reproduce benchmarks, generate the correct submission format, etc.</p><h3 id=tip-2-know-your-measure>Tip 2: Know your measure<a hidden class=anchor aria-hidden=true href=#tip-2-know-your-measure>#</a></h3><p>A key part of doing well in a competition is understanding how the measure works. It&rsquo;s often easy to obtain significant improvements in your score by using an optimisation approach that is suitable to the measure. A classic example is optimising the mean absolute error (MAE) versus the mean square error (MSE). It&rsquo;s easy to show that given no other data for a set of numbers, the predictor that minimises the MAE is the median, while the predictor that minimises the MSE is the mean. Indeed, in the <a href=https://www.kaggle.com/c/dsg-hackathon/forums/t/1821/general-approaches-to-partitioning-the-models/10631#post10631 target=_blank rel=noopener>EMC Data Science Hackathon</a> we fell back to the median rather than the mean when there wasn&rsquo;t enough data, and that ended up working pretty well.</p><h3 id=tip-3-know-your-data>Tip 3: Know your data<a hidden class=anchor aria-hidden=true href=#tip-3-know-your-data>#</a></h3><p>In Kaggle competitions, overspecialisation (without overfitting) is a good thing. This is unlike academic machine learning papers, where researchers often test their proposed method on many different datasets. This is also unlike more applied work, where you may care about data drifting and whether what you predict actually makes sense. Examples include the <a href=https://www.kaggle.com/c/dsg-hackathon/forums/t/1821/general-approaches-to-partitioning-the-models/10631#post10631 target=_blank rel=noopener>Hackathon</a>, where the measures of pollutants in the air were repeated for consecutive hours (i.e., they weren&rsquo;t really measured); the <a title="Greek Media Monitoring Kaggle competition: My approach" href=https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/ target=_blank rel=noopener>multi-label Greek article competition</a>, where I found connected components of labels (doesn&rsquo;t generalise well to other datasets); and the <a href=http://blog.kaggle.com/2012/04/29/on-diffusion-kernels-histograms-and-arabic-writer-identification/ target=_blank rel=noopener>Arabic writers competition</a>, where I used histogram kernels to deal with the features that we were given. The general lesson is that custom solutions win, and that&rsquo;s why the world needs data scientists (at least <a href=http://www.datarobot.com/ target=_blank rel=noopener>until we are replaced by robots</a>).</p><h3 id=tip-4-what-before-how>Tip 4: What before how<a hidden class=anchor aria-hidden=true href=#tip-4-what-before-how>#</a></h3><p>It&rsquo;s important to know <em>what</em> you want to model before figuring out <em>how</em> to model it. It seems like many beginners tend to worry too much about which tool to use (Python or R? Logistic regression or SVMs?), when they should be worrying about understanding the data and what useful patterns they want to capture. For example, when we worked on the <a href=https://www.kaggle.com/c/yandex-personalized-web-search-challenge/forums/t/6811/share-your-approach/37306#post37306 target=_blank rel=noopener>Yandex search personalisation competition</a>, we spent a lot of time looking at the data and thinking what makes sense for users to be doing. In that case it was easy to come up with ideas, because we all use search engines. But the main message is that to be effective, you have to become one with the data.</p><h3 id=tip-5-do-local-validation>Tip 5: Do local validation<a hidden class=anchor aria-hidden=true href=#tip-5-do-local-validation>#</a></h3><p>This is a point I covered in my <a href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/#validation>Kaggle beginner tips post</a>. Having a local validation environment allows you to move faster and produce more reliable results than when relying on the leaderboard. The main scenarios when you should skip local validation is when the data is too small (a problem I had in the <a href=http://blog.kaggle.com/2012/04/29/on-diffusion-kernels-histograms-and-arabic-writer-identification/ target=_blank rel=noopener>Arabic writers competition</a>), or when you run out of time (towards the end of the competition).</p><h3 id=tip-6-make-fewer-submissions>Tip 6: Make fewer submissions<a hidden class=anchor aria-hidden=true href=#tip-6-make-fewer-submissions>#</a></h3><p>In addition to making you look good, making few submissions reduces the likelihood of overfitting the leaderboard, which is a real problem. If your local validation is set up well and is consistent with the leaderboard (which you need to test by making one or two submissions), there&rsquo;s really no need to make many submissions. Further, if you&rsquo;re doing well, making submissions erodes your competitive advantage by showing your competitors what scores are obtainable and motivating them to work harder. Just resist the urge to submit, unless you have a really good reason.</p><h3 id=tip-7-do-your-research>Tip 7: Do your research<a hidden class=anchor aria-hidden=true href=#tip-7-do-your-research>#</a></h3><p>For any given problem, it&rsquo;s likely that there are people dedicating their lives to its solution. These people (often academics) have probably published papers, benchmarks and code, which you can learn from. Unlike actually winning, which is not only dependent on you, gaining deeper knowledge and understanding is the only sure reward of a competition. This has worked well for me, as I&rsquo;ve learned something new and applied it successfully in <a href=https://yanirseroussi.com/2014/04/05/kaggle-competition-summaries/>nearly every competition I&rsquo;ve worked on</a>.</p><h3 id=tip-8-apply-the-basics-rigorously>Tip 8: Apply the basics rigorously<a hidden class=anchor aria-hidden=true href=#tip-8-apply-the-basics-rigorously>#</a></h3><p>While playing with obscure methods can be a lot of fun, it&rsquo;s often the case that the basics will get you very far. Common algorithms have good implementations in most major languages, so there&rsquo;s really no reason not to try them. However, note that when you do try any methods, you <em>must</em> do some minimal tuning of the main parameters (e.g., number of trees in a random forest or the regularisation of a linear model). <strong>Running a method without minimal tuning is worse than not running it at all</strong>, because you may get a false negative – giving up on a method that actually works very well.</p><p>An example of applying the basics rigorously is in the classic paper <a href=http://jmlr.org/papers/volume5/rifkin04a/rifkin04a.pdf target=_blank rel=noopener>In defense of one-vs-all classification</a>, where the authors showed that the simple one-vs-all (OVA) approach to multiclass classification is at least as good as approaches that are much more sophisticated. In their words: &ldquo;What we find is that although a wide array of more sophisticated methods for multiclass classification exist, experimental evidence of the superiority of these methods over a simple OVA scheme is either lacking or improperly controlled or measured&rdquo;. If such a failure to perform proper experiments can happen to serious machine learning researchers, it can definitely happen to the average kaggler. Don&rsquo;t let it happen to you.</p><h3 id=tip-9-the-forum-is-your-friend>Tip 9: The forum is your friend<a hidden class=anchor aria-hidden=true href=#tip-9-the-forum-is-your-friend>#</a></h3><p>It&rsquo;s very important to subscribe to the forum to receive notifications on issues with the data or the competition. In addition, it&rsquo;s worth trying to figure out what your competitors are doing. An extreme example is the recent trend of code sharing during the competition (<a href=http://www.kaggle.com/forums/t/5681/fed-up-with-beating-benchmark-code/30787#post30787 target=_blank rel=noopener>which I don&rsquo;t really like</a>) – while it&rsquo;s not a good idea to rely on such code, it&rsquo;s important to be aware of its existence. Finally, reading the post-competition summaries on the forum is a valuable way of learning from the winners and improving over time.</p><h3 id=tip-10-ensemble-all-the-things>Tip 10: Ensemble all the things<a hidden class=anchor aria-hidden=true href=#tip-10-ensemble-all-the-things>#</a></h3><p>Not to be confused with ensemble methods (which are also very important), the idea here is to combine models that were developed independently. In high-profile competitions, it is often the case that teams merge and gain a significant boost from combining their models. This is worth doing even when competing alone, because almost no competition is won by a single model.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/kaggle/>Kaggle</a></li><li><a href=https://yanirseroussi.com/tags/kaggle-beginners/>Kaggle Beginners</a></li><li><a href=https://yanirseroussi.com/tags/kaggle-competition/>Kaggle Competition</a></li><li><a href=https://yanirseroussi.com/tags/predictive-modelling/>Predictive Modelling</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share How to (almost) win Kaggle competitions on x" href="https://x.com/intent/tweet/?text=How%20to%20%28almost%29%20win%20Kaggle%20competitions&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f24%2fhow-to-almost-win-kaggle-competitions%2f&amp;hashtags=datascience%2cKaggle%2cKagglebeginners%2cKagglecompetition%2cpredictivemodelling"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to (almost) win Kaggle competitions on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f24%2fhow-to-almost-win-kaggle-competitions%2f&amp;title=How%20to%20%28almost%29%20win%20Kaggle%20competitions&amp;summary=How%20to%20%28almost%29%20win%20Kaggle%20competitions&amp;source=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f24%2fhow-to-almost-win-kaggle-competitions%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to (almost) win Kaggle competitions on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f24%2fhow-to-almost-win-kaggle-competitions%2f&title=How%20to%20%28almost%29%20win%20Kaggle%20competitions"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to (almost) win Kaggle competitions on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f24%2fhow-to-almost-win-kaggle-competitions%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to (almost) win Kaggle competitions on whatsapp" href="https://api.whatsapp.com/send?text=How%20to%20%28almost%29%20win%20Kaggle%20competitions%20-%20https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f24%2fhow-to-almost-win-kaggle-competitions%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to (almost) win Kaggle competitions on telegram" href="https://telegram.me/share/url?text=How%20to%20%28almost%29%20win%20Kaggle%20competitions&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f24%2fhow-to-almost-win-kaggle-competitions%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to (almost) win Kaggle competitions on ycombinator" href="https://news.ycombinator.com/submitlink?t=How%20to%20%28almost%29%20win%20Kaggle%20competitions&u=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f24%2fhow-to-almost-win-kaggle-competitions%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/index.html b/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/index.html
index 7a108d329..2391dac00 100644
--- a/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/index.html
+++ b/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Building a Bandcamp recommender system (part 1 – motivation) | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="Bandcamp,BCRecommender,music,music industry,recommender systems"><meta name=description content="My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Building a Bandcamp recommender system (part 1 – motivation)"><meta property="og:description" content="My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/"><meta property="og:image" content="https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-08-30T08:11:38+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot.png"><meta name=twitter:title content="Building a Bandcamp recommender system (part 1 – motivation)"><meta name=twitter:description content="My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Building a Bandcamp recommender system (part 1 – motivation)","item":"https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Building a Bandcamp recommender system (part 1 – motivation)","name":"Building a Bandcamp recommender system (part 1 – motivation)","description":"My motivation behind building BCRecommender, a free recommendation \u0026amp; discovery service for Bandcamp music.","keywords":["Bandcamp","BCRecommender","music","music industry","recommender systems"],"articleBody":"I’ve been a Bandcamp user for a few years now. I love the fact that they pay out a significant share of the revenue directly to the artists, unlike other services. In addition, despite the fact that fans may stream all the music for free and even easily rip it, almost $80M were paid out to artists through Bandcamp to date (including almost $3M in the last month) – serving as strong evidence that the traditional music industry’s fight against piracy is a waste of resources and time.\nOne thing I’ve been struggling with since starting to use Bandcamp is the discovery of new music. Originally (in 2011), I used the browse-by-tag feature, but it is often too broad to find music that I like. A newer feature is the Discoverinator, which is meant to emulate the experience of browsing through covers at a record store – sadly, I could never find much stuff I liked using that method. Last year, Bandcamp announced Bandcamp for fans, which includes the ability to wishlist items and discover new music by stalking/following other fans. In addition, they released a mobile app, which made the music purchased on Bandcamp much easier to access.\nAll these new features definitely increased my engagement and helped me find more stuff to listen to, but I still feel that Bandcamp music discovery could be much better. Specifically, I would love to be served personalised recommendations and be able to browse music that is similar to specific tracks and albums that I like. Rather than waiting for Bandcamp to implement these features, I decided to do it myself. Visit BCRecommender – Bandcamp recommendations based on your fan account to see where this effort stands at the moment.\nWhile BCRecommender has already helped me discover new music to add to my collection, building it gave me many more ideas on how it can be improved, so it’s definitely a work in progress. I’ll probably tinker with the underlying algorithms as I go, so recommendations may occasionally seem weird (but this always seems to be the case with recommender systems in the real world). In subsequent posts I’ll discuss some of the technical details and where I’d like to take this project.\nIt’s probably worth noting that BCRecommender is not associated with or endorsed by Bandcamp, but I doubt they would mind since it was built using publicly-available information, and is full of links to buy the music back on their site.\n","wordCount":"411","inLanguage":"en","image":"https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot.png","datePublished":"2014-08-30T08:11:38Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Building a Bandcamp recommender system (part 1 – motivation)</h1><div class=post-meta><span title='2014-08-30 08:11:38 +0000 UTC'>August 30, 2014</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot_hu0bc6edb14393435331a10ae51f90dbe8_731004_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot_hu0bc6edb14393435331a10ae51f90dbe8_731004_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot_hu0bc6edb14393435331a10ae51f90dbe8_731004_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot_hu0bc6edb14393435331a10ae51f90dbe8_731004_1080x0_resize_box_3.png 1080w ,https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot_hu0bc6edb14393435331a10ae51f90dbe8_731004_1500x0_resize_box_3.png 1500w ,https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot.png 1581w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot.png alt width=1581 height=821></figure><div class=post-content><p>I&rsquo;ve been a <a href=http://bandcamp.com target=_blank rel=noopener>Bandcamp</a> user for a few years now. I love the fact that they pay out a <a href=https://bandcamp.com/pricing target=_blank rel=noopener>significant share of the revenue</a> directly to the artists, unlike <a href=https://en.wikipedia.org/wiki/Spotify#Criticism target=_blank rel=noopener>other services</a>. In addition, despite the fact that fans may stream all the music for free and even <a href=https://bandcamp.com/help/audio_basics#steal target=_blank rel=noopener>easily rip it</a>, almost $80M were paid out to artists through Bandcamp to date (including almost $3M in the last month) – serving as strong evidence that the traditional music industry&rsquo;s fight against piracy is a waste of resources and time.</p><p>One thing I&rsquo;ve been struggling with since starting to use Bandcamp is the discovery of new music. Originally (in 2011), I used the <a href=https://bandcamp.com/tags target=_blank rel=noopener>browse-by-tag</a> feature, but it is often too broad to find music that I like. A newer feature is the <a href=https://bandcamp.com/discover target=_blank rel=noopener>Discoverinator</a>, which is meant to emulate the experience of <a href=http://blog.bandcamp.com/2012/06/07/behold-the-glory-of-the-discoverinator/ target=_blank rel=noopener>browsing through covers at a record store</a> – sadly, I could never find much stuff I liked using that method. Last year, Bandcamp announced <a href=http://blog.bandcamp.com/2013/01/10/bandcamp-for-fans/ target=_blank rel=noopener>Bandcamp for fans</a>, which includes the ability to wishlist items and discover new music by stalking/following other fans. In addition, they released a <a href=http://blog.bandcamp.com/2013/10/25/its-over/ target=_blank rel=noopener>mobile app</a>, which made the music purchased on Bandcamp much easier to access.</p><p>All these new features definitely increased my engagement and helped me find more stuff to listen to, but I still feel that Bandcamp music discovery could be much better. Specifically, I would love to be served personalised recommendations and be able to browse music that is similar to specific tracks and albums that I like. Rather than waiting for Bandcamp to implement these features, I decided to do it myself. Visit <a href=http://www.bcrecommender.com target=_blank rel=noopener>BCRecommender – Bandcamp recommendations based on your fan account</a> to see where this effort stands at the moment.</p><p>While BCRecommender has already helped me discover new music to add to <a href=https://bandcamp.com/yanir target=_blank rel=noopener>my collection</a>, building it gave me many more ideas on how it can be improved, so it&rsquo;s definitely a work in progress. I&rsquo;ll probably tinker with the underlying algorithms as I go, so recommendations may occasionally seem weird (but this always seems to be the case with recommender systems in the real world). In subsequent posts I&rsquo;ll discuss some of the technical details and where I&rsquo;d like to take this project.</p><p><small><br>It&rsquo;s probably worth noting that BCRecommender is not associated with or endorsed by Bandcamp, but I doubt they would mind since it was built using publicly-available information, and is full of links to buy the music back on their site.<br></small></p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/bandcamp/>Bandcamp</a></li><li><a href=https://yanirseroussi.com/tags/bcrecommender/>BCRecommender</a></li><li><a href=https://yanirseroussi.com/tags/music/>Music</a></li><li><a href=https://yanirseroussi.com/tags/music-industry/>Music Industry</a></li><li><a href=https://yanirseroussi.com/tags/recommender-systems/>Recommender Systems</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a Bandcamp recommender system (part 1 – motivation) on x" href="https://x.com/intent/tweet/?text=Building%20a%20Bandcamp%20recommender%20system%20%28part%201%20%e2%80%93%20motivation%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f30%2fbuilding-a-bandcamp-recommender-system-part-1-motivation%2f&amp;hashtags=Bandcamp%2cBCRecommender%2cmusic%2cmusicindustry%2crecommendersystems"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a Bandcamp recommender system (part 1 – motivation) on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f30%2fbuilding-a-bandcamp-recommender-system-part-1-motivation%2f&amp;title=Building%20a%20Bandcamp%20recommender%20system%20%28part%201%20%e2%80%93%20motivation%29&amp;summary=Building%20a%20Bandcamp%20recommender%20system%20%28part%201%20%e2%80%93%20motivation%29&amp;source=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f30%2fbuilding-a-bandcamp-recommender-system-part-1-motivation%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a Bandcamp recommender system (part 1 – motivation) on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f30%2fbuilding-a-bandcamp-recommender-system-part-1-motivation%2f&title=Building%20a%20Bandcamp%20recommender%20system%20%28part%201%20%e2%80%93%20motivation%29"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a Bandcamp recommender system (part 1 – motivation) on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f30%2fbuilding-a-bandcamp-recommender-system-part-1-motivation%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a Bandcamp recommender system (part 1 – motivation) on whatsapp" href="https://api.whatsapp.com/send?text=Building%20a%20Bandcamp%20recommender%20system%20%28part%201%20%e2%80%93%20motivation%29%20-%20https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f30%2fbuilding-a-bandcamp-recommender-system-part-1-motivation%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a Bandcamp recommender system (part 1 – motivation) on telegram" href="https://telegram.me/share/url?text=Building%20a%20Bandcamp%20recommender%20system%20%28part%201%20%e2%80%93%20motivation%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f30%2fbuilding-a-bandcamp-recommender-system-part-1-motivation%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a Bandcamp recommender system (part 1 – motivation) on ycombinator" href="https://news.ycombinator.com/submitlink?t=Building%20a%20Bandcamp%20recommender%20system%20%28part%201%20%e2%80%93%20motivation%29&u=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f30%2fbuilding-a-bandcamp-recommender-system-part-1-motivation%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="Bandcamp,BCRecommender,music,music industry,recommender systems"><meta name=description content="My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Building a Bandcamp recommender system (part 1 – motivation)"><meta property="og:description" content="My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/"><meta property="og:image" content="https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-08-30T08:11:38+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot.png"><meta name=twitter:title content="Building a Bandcamp recommender system (part 1 – motivation)"><meta name=twitter:description content="My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Building a Bandcamp recommender system (part 1 – motivation)","item":"https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Building a Bandcamp recommender system (part 1 – motivation)","name":"Building a Bandcamp recommender system (part 1 – motivation)","description":"My motivation behind building BCRecommender, a free recommendation \u0026amp; discovery service for Bandcamp music.","keywords":["Bandcamp","BCRecommender","music","music industry","recommender systems"],"articleBody":"I’ve been a Bandcamp user for a few years now. I love the fact that they pay out a significant share of the revenue directly to the artists, unlike other services. In addition, despite the fact that fans may stream all the music for free and even easily rip it, almost $80M were paid out to artists through Bandcamp to date (including almost $3M in the last month) – serving as strong evidence that the traditional music industry’s fight against piracy is a waste of resources and time.\nOne thing I’ve been struggling with since starting to use Bandcamp is the discovery of new music. Originally (in 2011), I used the browse-by-tag feature, but it is often too broad to find music that I like. A newer feature is the Discoverinator, which is meant to emulate the experience of browsing through covers at a record store – sadly, I could never find much stuff I liked using that method. Last year, Bandcamp announced Bandcamp for fans, which includes the ability to wishlist items and discover new music by stalking/following other fans. In addition, they released a mobile app, which made the music purchased on Bandcamp much easier to access.\nAll these new features definitely increased my engagement and helped me find more stuff to listen to, but I still feel that Bandcamp music discovery could be much better. Specifically, I would love to be served personalised recommendations and be able to browse music that is similar to specific tracks and albums that I like. Rather than waiting for Bandcamp to implement these features, I decided to do it myself. Visit BCRecommender – Bandcamp recommendations based on your fan account to see where this effort stands at the moment.\nWhile BCRecommender has already helped me discover new music to add to my collection, building it gave me many more ideas on how it can be improved, so it’s definitely a work in progress. I’ll probably tinker with the underlying algorithms as I go, so recommendations may occasionally seem weird (but this always seems to be the case with recommender systems in the real world). In subsequent posts I’ll discuss some of the technical details and where I’d like to take this project.\nIt’s probably worth noting that BCRecommender is not associated with or endorsed by Bandcamp, but I doubt they would mind since it was built using publicly-available information, and is full of links to buy the music back on their site.\n","wordCount":"411","inLanguage":"en","image":"https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot.png","datePublished":"2014-08-30T08:11:38Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Building a Bandcamp recommender system (part 1 – motivation)</h1><div class=post-meta><span title='2014-08-30 08:11:38 +0000 UTC'>August 30, 2014</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot_hu0bc6edb14393435331a10ae51f90dbe8_731004_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot_hu0bc6edb14393435331a10ae51f90dbe8_731004_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot_hu0bc6edb14393435331a10ae51f90dbe8_731004_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot_hu0bc6edb14393435331a10ae51f90dbe8_731004_1080x0_resize_box_3.png 1080w ,https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot_hu0bc6edb14393435331a10ae51f90dbe8_731004_1500x0_resize_box_3.png 1500w ,https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot.png 1581w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/bcrecommender-screenshot.png alt width=1581 height=821></figure><div class=post-content><p>I&rsquo;ve been a <a href=http://bandcamp.com target=_blank rel=noopener>Bandcamp</a> user for a few years now. I love the fact that they pay out a <a href=https://bandcamp.com/pricing target=_blank rel=noopener>significant share of the revenue</a> directly to the artists, unlike <a href=https://en.wikipedia.org/wiki/Spotify#Criticism target=_blank rel=noopener>other services</a>. In addition, despite the fact that fans may stream all the music for free and even <a href=https://bandcamp.com/help/audio_basics#steal target=_blank rel=noopener>easily rip it</a>, almost $80M were paid out to artists through Bandcamp to date (including almost $3M in the last month) – serving as strong evidence that the traditional music industry&rsquo;s fight against piracy is a waste of resources and time.</p><p>One thing I&rsquo;ve been struggling with since starting to use Bandcamp is the discovery of new music. Originally (in 2011), I used the <a href=https://bandcamp.com/tags target=_blank rel=noopener>browse-by-tag</a> feature, but it is often too broad to find music that I like. A newer feature is the <a href=https://bandcamp.com/discover target=_blank rel=noopener>Discoverinator</a>, which is meant to emulate the experience of <a href=http://blog.bandcamp.com/2012/06/07/behold-the-glory-of-the-discoverinator/ target=_blank rel=noopener>browsing through covers at a record store</a> – sadly, I could never find much stuff I liked using that method. Last year, Bandcamp announced <a href=http://blog.bandcamp.com/2013/01/10/bandcamp-for-fans/ target=_blank rel=noopener>Bandcamp for fans</a>, which includes the ability to wishlist items and discover new music by stalking/following other fans. In addition, they released a <a href=http://blog.bandcamp.com/2013/10/25/its-over/ target=_blank rel=noopener>mobile app</a>, which made the music purchased on Bandcamp much easier to access.</p><p>All these new features definitely increased my engagement and helped me find more stuff to listen to, but I still feel that Bandcamp music discovery could be much better. Specifically, I would love to be served personalised recommendations and be able to browse music that is similar to specific tracks and albums that I like. Rather than waiting for Bandcamp to implement these features, I decided to do it myself. Visit <a href=http://www.bcrecommender.com target=_blank rel=noopener>BCRecommender – Bandcamp recommendations based on your fan account</a> to see where this effort stands at the moment.</p><p>While BCRecommender has already helped me discover new music to add to <a href=https://bandcamp.com/yanir target=_blank rel=noopener>my collection</a>, building it gave me many more ideas on how it can be improved, so it&rsquo;s definitely a work in progress. I&rsquo;ll probably tinker with the underlying algorithms as I go, so recommendations may occasionally seem weird (but this always seems to be the case with recommender systems in the real world). In subsequent posts I&rsquo;ll discuss some of the technical details and where I&rsquo;d like to take this project.</p><p><small><br>It&rsquo;s probably worth noting that BCRecommender is not associated with or endorsed by Bandcamp, but I doubt they would mind since it was built using publicly-available information, and is full of links to buy the music back on their site.<br></small></p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/bandcamp/>Bandcamp</a></li><li><a href=https://yanirseroussi.com/tags/bcrecommender/>BCRecommender</a></li><li><a href=https://yanirseroussi.com/tags/music/>Music</a></li><li><a href=https://yanirseroussi.com/tags/music-industry/>Music Industry</a></li><li><a href=https://yanirseroussi.com/tags/recommender-systems/>Recommender Systems</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a Bandcamp recommender system (part 1 – motivation) on x" href="https://x.com/intent/tweet/?text=Building%20a%20Bandcamp%20recommender%20system%20%28part%201%20%e2%80%93%20motivation%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f30%2fbuilding-a-bandcamp-recommender-system-part-1-motivation%2f&amp;hashtags=Bandcamp%2cBCRecommender%2cmusic%2cmusicindustry%2crecommendersystems"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a Bandcamp recommender system (part 1 – motivation) on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f30%2fbuilding-a-bandcamp-recommender-system-part-1-motivation%2f&amp;title=Building%20a%20Bandcamp%20recommender%20system%20%28part%201%20%e2%80%93%20motivation%29&amp;summary=Building%20a%20Bandcamp%20recommender%20system%20%28part%201%20%e2%80%93%20motivation%29&amp;source=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f30%2fbuilding-a-bandcamp-recommender-system-part-1-motivation%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a Bandcamp recommender system (part 1 – motivation) on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f30%2fbuilding-a-bandcamp-recommender-system-part-1-motivation%2f&title=Building%20a%20Bandcamp%20recommender%20system%20%28part%201%20%e2%80%93%20motivation%29"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a Bandcamp recommender system (part 1 – motivation) on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f30%2fbuilding-a-bandcamp-recommender-system-part-1-motivation%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a Bandcamp recommender system (part 1 – motivation) on whatsapp" href="https://api.whatsapp.com/send?text=Building%20a%20Bandcamp%20recommender%20system%20%28part%201%20%e2%80%93%20motivation%29%20-%20https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f30%2fbuilding-a-bandcamp-recommender-system-part-1-motivation%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a Bandcamp recommender system (part 1 – motivation) on telegram" href="https://telegram.me/share/url?text=Building%20a%20Bandcamp%20recommender%20system%20%28part%201%20%e2%80%93%20motivation%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f30%2fbuilding-a-bandcamp-recommender-system-part-1-motivation%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a Bandcamp recommender system (part 1 – motivation) on ycombinator" href="https://news.ycombinator.com/submitlink?t=Building%20a%20Bandcamp%20recommender%20system%20%28part%201%20%e2%80%93%20motivation%29&u=https%3a%2f%2fyanirseroussi.com%2f2014%2f08%2f30%2fbuilding-a-bandcamp-recommender-system-part-1-motivation%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/index.html b/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/index.html
index 104a3c474..4c17cbfe4 100644
--- a/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/index.html
+++ b/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="Bandcamp,BCRecommender,DevOps,recommender systems,software engineering"><meta name=description content="Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)"><meta property="og:description" content="Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/"><meta property="og:image" content="https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-09-07T10:48:44+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture.png"><meta name=twitter:title content="Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)"><meta name=twitter:description content="Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)","item":"https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)","name":"Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)","description":"Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service.","keywords":["Bandcamp","BCRecommender","DevOps","recommender systems","software engineering"],"articleBody":"This is the second part of a series of posts on my BCRecommender – personalised Bandcamp recommendations project. Check out the first part for the general motivation behind this project.\nBCRecommender is a hobby project whose main goal is to help me find music I like on Bandcamp. Its secondary goal is to serve as a testing ground for ideas I have and things I’d like to explore.\nOne question I’ve been wondering about is: how much money does one need to spend on infrastructure for a simple web-based product before it reaches meaningful traffic?\nThe answer is: not much at all. It can easily be done for less than $1 per month.\nThis post discusses my exploration of this question by describing the main components of the BCRecommender system, without getting into the algorithms that drive it (which will be covered in subsequent posts).\nThe general flow of BCRecommender is fairly simple: crawl publicly-available data from Bandcamp (fan collections and tracks/albums = tralbums), generate recommendations based on this data (static lists of tralbums indexed by fan for personalised recommendations and by tralbum for similarity), and present the recommendations to users in a way that’s easy to browse and explore (since we’re dealing with music it must be playable, which is easy to achieve by embedding Bandcamp’s iframes).\nFirst iteration: Django \u0026 AWS The first iteration of the project was implemented as a Django project. Having never built a Django project from scratch, I figured this would be a good way to learn how it’s done properly. One thing I was keen on learning was using the Django ORM with an SQL database (in the past I’ve worked with Django and MongoDB). This ended up working less smoothly than I expected, perhaps because I’m too used to MongoDB, or because SQL forces you to model your data in unnatural ways, or because I insisted on using SQLite for simplicity. Whatever it was, I quickly started missing MongoDB, despite its flaws.\nI chose AWS for hosting because my personal account was under the free tier, and using a micro instance is more than enough for serving a website with no traffic. I considered Google App Engine with its indefinite free tier, but after reading the docs I realised I don’t want to jump through so many hoops to use their system – Google’s free tier was likely to cost too much in pain and time.\nWhile an AWS micro instance is enough for serving the recommendations, it’s not enough for generating them. Rather than paying Amazon for another instance, I figured that using spare capacity on my own laptop (quad-core with 16GB of RAM) would be good enough. So the backend worker for BCRecommender ended up being a local virtual machine using one core and 4GB of RAM.\nAfter some coding I had a nice setup in place:\nAWS webserver running Django with SQLite as the database layer and a simple frontend, styled with Bootstrap Local backend worker running Celery under Supervisor to collect the data (with errors reported to a dedicated Gmail account), Dropbox for backups, and Django management commands to generate the recommendations Code and issue tracker hosted on Bitbucket (which provides free private repositories) Fabric scripts for deployments to the AWS webserver and the local backend worker (including database sync as one big SQLite file) Local virtual machine for development (provisioned with Vagrant) This system wasn’t going to scale, but I didn’t care. I just used it to discover new music, and it worked. I didn’t even bother registering a domain name, so it was all running for free.\nSecond iteration: “Django” backend \u0026 Parse A few months ago, Facebook announced that Parse’s free tier will include 30 requests / second. That’s over 2.5 million requests per day, which is quite a lot – probably enough to run the majority of websites on the internet. It seemed too good to be true, so I had to try it myself.\nIt took a few hours to convert the Django webserver/frontend code to Parse. This was fairly straightforward, and it had the added advantages of getting rid of some deployment scripts and having a more solid development environment. Parse supplies a command-line tool for deployment that constantly syncs the code to an app that is identical to the production app – much better than the Fabric script I had.\nThe disadvantages of the move to Parse were having to rewrite some of the backend in JavaScript (= less readable than Python), and a more complex data sync command (no longer just copying a big SQLite file). However, I would definitely use it for other projects because of the generous free tier, the availability of APIs for all major platforms, and the elimination of most operational concerns.\nCurrent iteration: Goodbye Django, hello BCRecommender With the Django webserver out of the way, there was little use left for Django in the project. It took a few more hours to get rid of it, replacing the management commands with Commandr, and the SQLite database with MongoDB (wrapped with the excellent MongoEngine, which has matured a lot in recent years). MongoDB has become a more natural choice now, since it is the database used by Parse. I expect this setup of a local Python backend and a Parse frontend to work quite well (and remain virtually free) for the foreseeable future.\nThe only fixed cost I now have comes from registering the bcrecommender.com domain and managing it with Route 53. This wasn’t required when I was running it only for myself, and I could have just kept it under bcrecommender.parseapp.com, but I think it would be useful for other Bandcamp users. I would also like to use it as a training lab to improve my (poor) marketing skills – not having a dedicated domain just looks bad.\nIn summary, it’s definitely possible to build simple projects and host them for free. It also looks like my approach would scale way beyond the current BCRecommender volume. The next post in this series will cover some of the algorithms and general considerations of building the recommender system.\n","wordCount":"1021","inLanguage":"en","image":"https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture.png","datePublished":"2014-09-07T10:48:44Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)</h1><div class=post-meta><span title='2014-09-07 10:48:44 +0000 UTC'>September 7, 2014</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture_hu30771a1e5f4a580acd5b458b23f57625_45736_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture_hu30771a1e5f4a580acd5b458b23f57625_45736_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture_hu30771a1e5f4a580acd5b458b23f57625_45736_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture_hu30771a1e5f4a580acd5b458b23f57625_45736_1080x0_resize_box_3.png 1080w ,https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture.png 1176w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture.png alt width=1176 height=526></figure><div class=post-content><p class=intro-note>This is the second part of a series of posts on my <a href=http://www.bcrecommender.com target=_blank rel=noopener>BCRecommender – personalised Bandcamp recommendations</a> project. Check out <a href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/>the first part</a> for the general motivation behind this project.</p><p><a href=http://www.bcrecommender.com target=_blank rel=noopener>BCRecommender</a> is a hobby project whose main goal is to help me find music I like on <a href=https://bandcamp.com target=_blank rel=noopener>Bandcamp</a>. Its secondary goal is to serve as a testing ground for ideas I have and things I&rsquo;d like to explore.<br>One question I&rsquo;ve been wondering about is: how much money does one need to spend on infrastructure for a simple web-based product before it reaches meaningful traffic?<br>The answer is: not much at all. It can easily be done for less than $1 per month.<br>This post discusses my exploration of this question by describing the main components of the BCRecommender system, without getting into the algorithms that drive it (which will be covered in subsequent posts).</p><p>The general flow of BCRecommender is fairly simple: crawl publicly-available data from Bandcamp (fan collections and tracks/albums = tralbums), generate recommendations based on this data (static lists of tralbums indexed by fan for personalised recommendations and by tralbum for similarity), and present the recommendations to users in a way that&rsquo;s easy to browse and explore (since we&rsquo;re dealing with music it must be playable, which is easy to achieve by embedding Bandcamp&rsquo;s iframes).</p><h3 id=first-iteration-django--aws>First iteration: Django & AWS<a hidden class=anchor aria-hidden=true href=#first-iteration-django--aws>#</a></h3><p>The first iteration of the project was implemented as a <a href=https://www.djangoproject.com/ target=_blank rel=noopener>Django</a> project. Having never built a Django project from scratch, I figured this would be a good way to learn how it&rsquo;s done properly. One thing I was keen on learning was using the Django ORM with an SQL database (in the past I&rsquo;ve worked with Django and <a href=https://www.mongodb.org/ target=_blank rel=noopener>MongoDB</a>). This ended up working less smoothly than I expected, perhaps because I&rsquo;m too used to MongoDB, or because SQL forces you to model your data in unnatural ways, or because I insisted on using <a href=https://sqlite.org/ target=_blank rel=noopener>SQLite</a> for simplicity. Whatever it was, I quickly started missing MongoDB, despite its flaws.</p><p>I chose <a href=https://aws.amazon.com/ target=_blank rel=noopener>AWS</a> for hosting because my personal account was under the free tier, and using a micro instance is more than enough for serving a website with no traffic. I considered <a href=https://developers.google.com/appengine/ target=_blank rel=noopener>Google App Engine</a> with its indefinite free tier, but after reading the docs I realised I don&rsquo;t want to jump through so many hoops to use their system – Google&rsquo;s free tier was likely to cost too much in pain and time.</p><p>While an AWS micro instance is enough for <em>serving</em> the recommendations, it&rsquo;s not enough for generating them. Rather than paying Amazon for another instance, I figured that using spare capacity on my own laptop (quad-core with 16GB of RAM) would be good enough. So the backend worker for BCRecommender ended up being a local virtual machine using one core and 4GB of RAM.</p><p>After some coding I had a nice setup in place:</p><ul><li>AWS webserver running Django with SQLite as the database layer and a simple frontend, styled with <a href=http://getbootstrap.com/ target=_blank rel=noopener>Bootstrap</a></li><li>Local backend worker running <a href=http://www.celeryproject.org/ target=_blank rel=noopener>Celery</a> under <a href=http://supervisord.org/ target=_blank rel=noopener>Supervisor</a> to collect the data (with errors reported to a dedicated Gmail account), Dropbox for backups, and Django management commands to generate the recommendations</li><li>Code and issue tracker hosted on <a href=https://bitbucket.org/ target=_blank rel=noopener>Bitbucket</a> (which provides free private repositories)</li><li><a href=http://www.fabfile.org/ target=_blank rel=noopener>Fabric</a> scripts for deployments to the AWS webserver and the local backend worker (including database sync as one big SQLite file)</li><li>Local virtual machine for development (provisioned with <a href=http://www.vagrantup.com/ target=_blank rel=noopener>Vagrant</a>)</li></ul><p>This system wasn&rsquo;t going to scale, but I didn&rsquo;t care. I just used it to discover new music, and it worked. I didn&rsquo;t even bother registering a domain name, so it was all running for free.</p><h3 id=second-iteration-django-backend--parse>Second iteration: &ldquo;Django&rdquo; backend & Parse<a hidden class=anchor aria-hidden=true href=#second-iteration-django-backend--parse>#</a></h3><p>A few months ago, <a href=http://blog.parse.com/2014/04/30/parse-pricing-now-cheaper-and-simpler/ target=_blank rel=noopener>Facebook announced that Parse&rsquo;s free tier will include 30 requests / second</a>. That&rsquo;s over 2.5 million requests per day, which is quite a lot – probably enough to run the majority of websites on the internet. It seemed too good to be true, so I had to try it myself.</p><p>It took a few hours to convert the Django webserver/frontend code to Parse. This was fairly straightforward, and it had the added advantages of getting rid of some deployment scripts and having a more solid development environment. Parse supplies a command-line tool for deployment that constantly syncs the code to an app that is identical to the production app – much better than the Fabric script I had.</p><p>The disadvantages of the move to Parse were having to rewrite some of the backend in JavaScript (= less readable than Python), and a more complex data sync command (no longer just copying a big SQLite file). However, I would definitely use it for other projects because of the generous free tier, the availability of APIs for all major platforms, and the elimination of most operational concerns.</p><h3 id=current-iteration-goodbye-django-hello-bcrecommender>Current iteration: Goodbye Django, hello BCRecommender<a hidden class=anchor aria-hidden=true href=#current-iteration-goodbye-django-hello-bcrecommender>#</a></h3><p>With the Django webserver out of the way, there was little use left for Django in the project. It took a few more hours to get rid of it, replacing the management commands with <a href=https://github.com/tellapart/commandr target=_blank rel=noopener>Commandr</a>, and the SQLite database with MongoDB (wrapped with the excellent <a href=http://mongoengine.org/ target=_blank rel=noopener>MongoEngine</a>, which has matured a lot in recent years). MongoDB has become a more natural choice now, since it is the database used by Parse. I expect this setup of a local Python backend and a Parse frontend to work quite well (and remain virtually free) for the foreseeable future.</p><p>The only fixed cost I now have comes from registering the <a href=http://www.bcrecommender.com target=_blank rel=noopener>bcrecommender.com domain</a> and managing it with Route 53. This wasn&rsquo;t required when I was running it only for myself, and I could have just kept it under bcrecommender.parseapp.com, but I think it would be useful for other Bandcamp users. I would also like to use it as a training lab to improve my (poor) marketing skills – not having a dedicated domain just looks bad.</p><p>In summary, it&rsquo;s definitely possible to build simple projects and host them for free. It also looks like my approach would scale way beyond the current BCRecommender volume. The next post in this series will cover some of the algorithms and general considerations of building the recommender system.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/bandcamp/>Bandcamp</a></li><li><a href=https://yanirseroussi.com/tags/bcrecommender/>BCRecommender</a></li><li><a href=https://yanirseroussi.com/tags/devops/>DevOps</a></li><li><a href=https://yanirseroussi.com/tags/recommender-systems/>Recommender Systems</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) on x" href="https://x.com/intent/tweet/?text=Building%20a%20recommender%20system%20on%20a%20shoestring%20budget%20%28or%3a%20BCRecommender%20part%202%20%e2%80%93%20general%20system%20layout%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f07%2fbuilding-a-recommender-system-on-a-shoestring-budget%2f&amp;hashtags=Bandcamp%2cBCRecommender%2cDevOps%2crecommendersystems%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f07%2fbuilding-a-recommender-system-on-a-shoestring-budget%2f&amp;title=Building%20a%20recommender%20system%20on%20a%20shoestring%20budget%20%28or%3a%20BCRecommender%20part%202%20%e2%80%93%20general%20system%20layout%29&amp;summary=Building%20a%20recommender%20system%20on%20a%20shoestring%20budget%20%28or%3a%20BCRecommender%20part%202%20%e2%80%93%20general%20system%20layout%29&amp;source=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f07%2fbuilding-a-recommender-system-on-a-shoestring-budget%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f07%2fbuilding-a-recommender-system-on-a-shoestring-budget%2f&title=Building%20a%20recommender%20system%20on%20a%20shoestring%20budget%20%28or%3a%20BCRecommender%20part%202%20%e2%80%93%20general%20system%20layout%29"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f07%2fbuilding-a-recommender-system-on-a-shoestring-budget%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) on whatsapp" href="https://api.whatsapp.com/send?text=Building%20a%20recommender%20system%20on%20a%20shoestring%20budget%20%28or%3a%20BCRecommender%20part%202%20%e2%80%93%20general%20system%20layout%29%20-%20https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f07%2fbuilding-a-recommender-system-on-a-shoestring-budget%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) on telegram" href="https://telegram.me/share/url?text=Building%20a%20recommender%20system%20on%20a%20shoestring%20budget%20%28or%3a%20BCRecommender%20part%202%20%e2%80%93%20general%20system%20layout%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f07%2fbuilding-a-recommender-system-on-a-shoestring-budget%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) on ycombinator" href="https://news.ycombinator.com/submitlink?t=Building%20a%20recommender%20system%20on%20a%20shoestring%20budget%20%28or%3a%20BCRecommender%20part%202%20%e2%80%93%20general%20system%20layout%29&u=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f07%2fbuilding-a-recommender-system-on-a-shoestring-budget%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="Bandcamp,BCRecommender,DevOps,recommender systems,software engineering"><meta name=description content="Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)"><meta property="og:description" content="Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/"><meta property="og:image" content="https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-09-07T10:48:44+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture.png"><meta name=twitter:title content="Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)"><meta name=twitter:description content="Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)","item":"https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)","name":"Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)","description":"Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service.","keywords":["Bandcamp","BCRecommender","DevOps","recommender systems","software engineering"],"articleBody":"This is the second part of a series of posts on my BCRecommender – personalised Bandcamp recommendations project. Check out the first part for the general motivation behind this project.\nBCRecommender is a hobby project whose main goal is to help me find music I like on Bandcamp. Its secondary goal is to serve as a testing ground for ideas I have and things I’d like to explore.\nOne question I’ve been wondering about is: how much money does one need to spend on infrastructure for a simple web-based product before it reaches meaningful traffic?\nThe answer is: not much at all. It can easily be done for less than $1 per month.\nThis post discusses my exploration of this question by describing the main components of the BCRecommender system, without getting into the algorithms that drive it (which will be covered in subsequent posts).\nThe general flow of BCRecommender is fairly simple: crawl publicly-available data from Bandcamp (fan collections and tracks/albums = tralbums), generate recommendations based on this data (static lists of tralbums indexed by fan for personalised recommendations and by tralbum for similarity), and present the recommendations to users in a way that’s easy to browse and explore (since we’re dealing with music it must be playable, which is easy to achieve by embedding Bandcamp’s iframes).\nFirst iteration: Django \u0026 AWS The first iteration of the project was implemented as a Django project. Having never built a Django project from scratch, I figured this would be a good way to learn how it’s done properly. One thing I was keen on learning was using the Django ORM with an SQL database (in the past I’ve worked with Django and MongoDB). This ended up working less smoothly than I expected, perhaps because I’m too used to MongoDB, or because SQL forces you to model your data in unnatural ways, or because I insisted on using SQLite for simplicity. Whatever it was, I quickly started missing MongoDB, despite its flaws.\nI chose AWS for hosting because my personal account was under the free tier, and using a micro instance is more than enough for serving a website with no traffic. I considered Google App Engine with its indefinite free tier, but after reading the docs I realised I don’t want to jump through so many hoops to use their system – Google’s free tier was likely to cost too much in pain and time.\nWhile an AWS micro instance is enough for serving the recommendations, it’s not enough for generating them. Rather than paying Amazon for another instance, I figured that using spare capacity on my own laptop (quad-core with 16GB of RAM) would be good enough. So the backend worker for BCRecommender ended up being a local virtual machine using one core and 4GB of RAM.\nAfter some coding I had a nice setup in place:\nAWS webserver running Django with SQLite as the database layer and a simple frontend, styled with Bootstrap Local backend worker running Celery under Supervisor to collect the data (with errors reported to a dedicated Gmail account), Dropbox for backups, and Django management commands to generate the recommendations Code and issue tracker hosted on Bitbucket (which provides free private repositories) Fabric scripts for deployments to the AWS webserver and the local backend worker (including database sync as one big SQLite file) Local virtual machine for development (provisioned with Vagrant) This system wasn’t going to scale, but I didn’t care. I just used it to discover new music, and it worked. I didn’t even bother registering a domain name, so it was all running for free.\nSecond iteration: “Django” backend \u0026 Parse A few months ago, Facebook announced that Parse’s free tier will include 30 requests / second. That’s over 2.5 million requests per day, which is quite a lot – probably enough to run the majority of websites on the internet. It seemed too good to be true, so I had to try it myself.\nIt took a few hours to convert the Django webserver/frontend code to Parse. This was fairly straightforward, and it had the added advantages of getting rid of some deployment scripts and having a more solid development environment. Parse supplies a command-line tool for deployment that constantly syncs the code to an app that is identical to the production app – much better than the Fabric script I had.\nThe disadvantages of the move to Parse were having to rewrite some of the backend in JavaScript (= less readable than Python), and a more complex data sync command (no longer just copying a big SQLite file). However, I would definitely use it for other projects because of the generous free tier, the availability of APIs for all major platforms, and the elimination of most operational concerns.\nCurrent iteration: Goodbye Django, hello BCRecommender With the Django webserver out of the way, there was little use left for Django in the project. It took a few more hours to get rid of it, replacing the management commands with Commandr, and the SQLite database with MongoDB (wrapped with the excellent MongoEngine, which has matured a lot in recent years). MongoDB has become a more natural choice now, since it is the database used by Parse. I expect this setup of a local Python backend and a Parse frontend to work quite well (and remain virtually free) for the foreseeable future.\nThe only fixed cost I now have comes from registering the bcrecommender.com domain and managing it with Route 53. This wasn’t required when I was running it only for myself, and I could have just kept it under bcrecommender.parseapp.com, but I think it would be useful for other Bandcamp users. I would also like to use it as a training lab to improve my (poor) marketing skills – not having a dedicated domain just looks bad.\nIn summary, it’s definitely possible to build simple projects and host them for free. It also looks like my approach would scale way beyond the current BCRecommender volume. The next post in this series will cover some of the algorithms and general considerations of building the recommender system.\n","wordCount":"1021","inLanguage":"en","image":"https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture.png","datePublished":"2014-09-07T10:48:44Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)</h1><div class=post-meta><span title='2014-09-07 10:48:44 +0000 UTC'>September 7, 2014</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture_hu30771a1e5f4a580acd5b458b23f57625_45736_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture_hu30771a1e5f4a580acd5b458b23f57625_45736_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture_hu30771a1e5f4a580acd5b458b23f57625_45736_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture_hu30771a1e5f4a580acd5b458b23f57625_45736_1080x0_resize_box_3.png 1080w ,https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture.png 1176w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/bcrecommender-architecture.png alt width=1176 height=526></figure><div class=post-content><p class=intro-note>This is the second part of a series of posts on my <a href=http://www.bcrecommender.com target=_blank rel=noopener>BCRecommender – personalised Bandcamp recommendations</a> project. Check out <a href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/>the first part</a> for the general motivation behind this project.</p><p><a href=http://www.bcrecommender.com target=_blank rel=noopener>BCRecommender</a> is a hobby project whose main goal is to help me find music I like on <a href=https://bandcamp.com target=_blank rel=noopener>Bandcamp</a>. Its secondary goal is to serve as a testing ground for ideas I have and things I&rsquo;d like to explore.<br>One question I&rsquo;ve been wondering about is: how much money does one need to spend on infrastructure for a simple web-based product before it reaches meaningful traffic?<br>The answer is: not much at all. It can easily be done for less than $1 per month.<br>This post discusses my exploration of this question by describing the main components of the BCRecommender system, without getting into the algorithms that drive it (which will be covered in subsequent posts).</p><p>The general flow of BCRecommender is fairly simple: crawl publicly-available data from Bandcamp (fan collections and tracks/albums = tralbums), generate recommendations based on this data (static lists of tralbums indexed by fan for personalised recommendations and by tralbum for similarity), and present the recommendations to users in a way that&rsquo;s easy to browse and explore (since we&rsquo;re dealing with music it must be playable, which is easy to achieve by embedding Bandcamp&rsquo;s iframes).</p><h3 id=first-iteration-django--aws>First iteration: Django & AWS<a hidden class=anchor aria-hidden=true href=#first-iteration-django--aws>#</a></h3><p>The first iteration of the project was implemented as a <a href=https://www.djangoproject.com/ target=_blank rel=noopener>Django</a> project. Having never built a Django project from scratch, I figured this would be a good way to learn how it&rsquo;s done properly. One thing I was keen on learning was using the Django ORM with an SQL database (in the past I&rsquo;ve worked with Django and <a href=https://www.mongodb.org/ target=_blank rel=noopener>MongoDB</a>). This ended up working less smoothly than I expected, perhaps because I&rsquo;m too used to MongoDB, or because SQL forces you to model your data in unnatural ways, or because I insisted on using <a href=https://sqlite.org/ target=_blank rel=noopener>SQLite</a> for simplicity. Whatever it was, I quickly started missing MongoDB, despite its flaws.</p><p>I chose <a href=https://aws.amazon.com/ target=_blank rel=noopener>AWS</a> for hosting because my personal account was under the free tier, and using a micro instance is more than enough for serving a website with no traffic. I considered <a href=https://developers.google.com/appengine/ target=_blank rel=noopener>Google App Engine</a> with its indefinite free tier, but after reading the docs I realised I don&rsquo;t want to jump through so many hoops to use their system – Google&rsquo;s free tier was likely to cost too much in pain and time.</p><p>While an AWS micro instance is enough for <em>serving</em> the recommendations, it&rsquo;s not enough for generating them. Rather than paying Amazon for another instance, I figured that using spare capacity on my own laptop (quad-core with 16GB of RAM) would be good enough. So the backend worker for BCRecommender ended up being a local virtual machine using one core and 4GB of RAM.</p><p>After some coding I had a nice setup in place:</p><ul><li>AWS webserver running Django with SQLite as the database layer and a simple frontend, styled with <a href=http://getbootstrap.com/ target=_blank rel=noopener>Bootstrap</a></li><li>Local backend worker running <a href=http://www.celeryproject.org/ target=_blank rel=noopener>Celery</a> under <a href=http://supervisord.org/ target=_blank rel=noopener>Supervisor</a> to collect the data (with errors reported to a dedicated Gmail account), Dropbox for backups, and Django management commands to generate the recommendations</li><li>Code and issue tracker hosted on <a href=https://bitbucket.org/ target=_blank rel=noopener>Bitbucket</a> (which provides free private repositories)</li><li><a href=http://www.fabfile.org/ target=_blank rel=noopener>Fabric</a> scripts for deployments to the AWS webserver and the local backend worker (including database sync as one big SQLite file)</li><li>Local virtual machine for development (provisioned with <a href=http://www.vagrantup.com/ target=_blank rel=noopener>Vagrant</a>)</li></ul><p>This system wasn&rsquo;t going to scale, but I didn&rsquo;t care. I just used it to discover new music, and it worked. I didn&rsquo;t even bother registering a domain name, so it was all running for free.</p><h3 id=second-iteration-django-backend--parse>Second iteration: &ldquo;Django&rdquo; backend & Parse<a hidden class=anchor aria-hidden=true href=#second-iteration-django-backend--parse>#</a></h3><p>A few months ago, <a href=http://blog.parse.com/2014/04/30/parse-pricing-now-cheaper-and-simpler/ target=_blank rel=noopener>Facebook announced that Parse&rsquo;s free tier will include 30 requests / second</a>. That&rsquo;s over 2.5 million requests per day, which is quite a lot – probably enough to run the majority of websites on the internet. It seemed too good to be true, so I had to try it myself.</p><p>It took a few hours to convert the Django webserver/frontend code to Parse. This was fairly straightforward, and it had the added advantages of getting rid of some deployment scripts and having a more solid development environment. Parse supplies a command-line tool for deployment that constantly syncs the code to an app that is identical to the production app – much better than the Fabric script I had.</p><p>The disadvantages of the move to Parse were having to rewrite some of the backend in JavaScript (= less readable than Python), and a more complex data sync command (no longer just copying a big SQLite file). However, I would definitely use it for other projects because of the generous free tier, the availability of APIs for all major platforms, and the elimination of most operational concerns.</p><h3 id=current-iteration-goodbye-django-hello-bcrecommender>Current iteration: Goodbye Django, hello BCRecommender<a hidden class=anchor aria-hidden=true href=#current-iteration-goodbye-django-hello-bcrecommender>#</a></h3><p>With the Django webserver out of the way, there was little use left for Django in the project. It took a few more hours to get rid of it, replacing the management commands with <a href=https://github.com/tellapart/commandr target=_blank rel=noopener>Commandr</a>, and the SQLite database with MongoDB (wrapped with the excellent <a href=http://mongoengine.org/ target=_blank rel=noopener>MongoEngine</a>, which has matured a lot in recent years). MongoDB has become a more natural choice now, since it is the database used by Parse. I expect this setup of a local Python backend and a Parse frontend to work quite well (and remain virtually free) for the foreseeable future.</p><p>The only fixed cost I now have comes from registering the <a href=http://www.bcrecommender.com target=_blank rel=noopener>bcrecommender.com domain</a> and managing it with Route 53. This wasn&rsquo;t required when I was running it only for myself, and I could have just kept it under bcrecommender.parseapp.com, but I think it would be useful for other Bandcamp users. I would also like to use it as a training lab to improve my (poor) marketing skills – not having a dedicated domain just looks bad.</p><p>In summary, it&rsquo;s definitely possible to build simple projects and host them for free. It also looks like my approach would scale way beyond the current BCRecommender volume. The next post in this series will cover some of the algorithms and general considerations of building the recommender system.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/bandcamp/>Bandcamp</a></li><li><a href=https://yanirseroussi.com/tags/bcrecommender/>BCRecommender</a></li><li><a href=https://yanirseroussi.com/tags/devops/>DevOps</a></li><li><a href=https://yanirseroussi.com/tags/recommender-systems/>Recommender Systems</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) on x" href="https://x.com/intent/tweet/?text=Building%20a%20recommender%20system%20on%20a%20shoestring%20budget%20%28or%3a%20BCRecommender%20part%202%20%e2%80%93%20general%20system%20layout%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f07%2fbuilding-a-recommender-system-on-a-shoestring-budget%2f&amp;hashtags=Bandcamp%2cBCRecommender%2cDevOps%2crecommendersystems%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f07%2fbuilding-a-recommender-system-on-a-shoestring-budget%2f&amp;title=Building%20a%20recommender%20system%20on%20a%20shoestring%20budget%20%28or%3a%20BCRecommender%20part%202%20%e2%80%93%20general%20system%20layout%29&amp;summary=Building%20a%20recommender%20system%20on%20a%20shoestring%20budget%20%28or%3a%20BCRecommender%20part%202%20%e2%80%93%20general%20system%20layout%29&amp;source=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f07%2fbuilding-a-recommender-system-on-a-shoestring-budget%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f07%2fbuilding-a-recommender-system-on-a-shoestring-budget%2f&title=Building%20a%20recommender%20system%20on%20a%20shoestring%20budget%20%28or%3a%20BCRecommender%20part%202%20%e2%80%93%20general%20system%20layout%29"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f07%2fbuilding-a-recommender-system-on-a-shoestring-budget%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) on whatsapp" href="https://api.whatsapp.com/send?text=Building%20a%20recommender%20system%20on%20a%20shoestring%20budget%20%28or%3a%20BCRecommender%20part%202%20%e2%80%93%20general%20system%20layout%29%20-%20https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f07%2fbuilding-a-recommender-system-on-a-shoestring-budget%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) on telegram" href="https://telegram.me/share/url?text=Building%20a%20recommender%20system%20on%20a%20shoestring%20budget%20%28or%3a%20BCRecommender%20part%202%20%e2%80%93%20general%20system%20layout%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f07%2fbuilding-a-recommender-system-on-a-shoestring-budget%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) on ycombinator" href="https://news.ycombinator.com/submitlink?t=Building%20a%20recommender%20system%20on%20a%20shoestring%20budget%20%28or%3a%20BCRecommender%20part%202%20%e2%80%93%20general%20system%20layout%29&u=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f07%2fbuilding-a-recommender-system-on-a-shoestring-budget%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/index.html b/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/index.html
index 04c1c0899..93e00cfbc 100644
--- a/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/index.html
+++ b/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Bandcamp recommendation and discovery algorithms | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="Bandcamp,BCRecommender,data science,music,predictive modelling,recommender systems"><meta name=description content="The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Bandcamp recommendation and discovery algorithms"><meta property="og:description" content="The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-09-19T14:26:55+00:00"><meta property="article:modified_time" content="2023-07-06T09:28:02+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Bandcamp recommendation and discovery algorithms"><meta name=twitter:description content="The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Bandcamp recommendation and discovery algorithms","item":"https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Bandcamp recommendation and discovery algorithms","name":"Bandcamp recommendation and discovery algorithms","description":"The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.","keywords":["Bandcamp","BCRecommender","data science","music","predictive modelling","recommender systems"],"articleBody":" This is the third part of a series of posts on my Bandcamp recommendations (BCRecommender) project. Check out the first part for the general motivation behind this project and the second part for the system architecture. The main goal of the BCRecommender project is to help me find music I like. This post discusses the algorithmic approaches I took towards that goal. I’ve kept the descriptions at a fairly high-level, without getting too much into the maths, as all recommendation algorithms essentially try to model simple intuition. Please leave a comment if you feel like something needs to be explained further.\nData \u0026 evaluation approach The data was collected from publicly-indexable Bandcamp fan and track/album (aka tralbum) pages. For each fan, it consists of the tralbum IDs they bought or wishlisted. For each tralbum, the saved data includes the type (track/album), URL, title, artist name, and the tags (as assigned by the artist).\nAt the moment, I have data for about 160K fans, 335K albums and 170K tracks. These fans have expressed their preference for tralbums through purchasing or wishlisting about 3.4M times. There are about 210K unique tags across the 505K tralbums, with the mean number of tags per tralbum being 7. These figures represent a fairly sparse dataset, which makes recommendation somewhat challenging. Perhaps this is why Bandcamp doesn’t do much algorithmic recommendation.\nBefore moving on to describe the recommendation approaches I played with, it is worth noting that at this stage, my way of evaluating the recommendations isn’t very rigorous. If I can easily find new music that I like, I’m happy. As such, offline evaluation approaches (e.g., some form of cross-validation) are unlikely to correlate well with my goal, so I just didn’t bother with them. Having more data would allow me to perform more rigorous online evaluation to see what makes other people happy with the recommendations.\nPersonalised recommendations with preferences (collaborative filtering) My first crack at recommendation generation was using collaborative filtering. The broad idea behind collaborative filtering is using only the preference matrix to find patterns in the data, and generate recommendations accordingly. The preference matrix is defined to have a row for each user and a column for each item. Each matrix element value indicates the level of preference by the user for the item. To keep things simple, I used unary preference values, where the element that corresponds to user/fan u and item/tralbum i is set to 1 if the fan purchased or wishlisted the tralbum, or set to missing otherwise.\nA simple example for collaborative filtering is in the following image, which was taken from the Wikipedia article on the topic.\nI used matrix factorisation as the collaborative filtering algorithm. This algorithm was a key part of the winning team’s solution to the Netflix competition. Unsurprisingly, it didn’t work that well. The key issue is that there are 160K * (335K + 170K) = 80.8B possible preferences in the dataset, but only 3.4M (0.004%) preferences are given. What matrix factorisation tries to do is to predict the remaining 99.996% of preferences based on the tiny percentage of given data. This just didn’t yield any music recommendations I liked, even when I made the matrix denser by dropping fans and tralbums with few preferences. Therefore, I moved on to employing an algorithm that can use more data – the tags.\nPersonalised recommendations with tags and preferences (collaborative filtering and content-based hybrid) Using data about the items is referred to as content-based recommendation in the literature. In the Bandcamp recommender case, the content data that is most easy to use is the tags that artists assign to their work. The idea is to build a profile for each fan based on tags for their tralbums, and recommend tralbums with tags that match the fan’s profile.\nAs mentioned above, the dataset contains 210K unique tags for 505K tralbums, which means that this representation of the dataset is also rather sparse. One obvious way of making it denser is by dropping rare tags. I also “tagged” each tralbum with a fan’s username if that fan purchased or wishlisted the tralbum. In addition to yielding a richer tralbum representation, this approach makes the recommendations likely to be less obvious than those based only on tags. For example, all tralbums tagged with rock are likely to be rock albums, but tralbums tagged with yanir are somewhat more varied.\nTo make the tralbum representation denser I used the latent Dirichlet allocation (LDA) implementation from the excellent gensim library. LDA assumes that there’s a fixed number of topics (distributions over tags, i.e., weighted lists of tags), and that every tralbum’s tags are generated from its topics. In practice, this magically yields clusters of tags and tralbums that can be used to generate recommendations. For example, the following word cloud presents the top tags in one cluster, which is focused on psychedelic-progressive rock. Each tralbum is assigned a probability of being generated from this cluster. This means that each tralbum is now represented as a probability distribution over a fixed number of topics – much denser than the raw tag data.\nUsing LDA for generating recommendations is straightforward, as each fan can be represented as the concatenation of the tags assigned to their tralbums, together with their own user tag. This representation is then converted to a topic distribution, which is compared to all the tralbums to yield the most similar ones.\nThis approach yielded much better results than collaborative filtering. I actually found albums I like and made some purchases, more than just the three that are annotated on my fan page (I didn’t want to be too spammy). Woohoo!\nHowever, the problem with this approach is that it doesn’t take my mood into account, as it is based on my entire profile. To address this, I introduced similar music and cluster-based discovery.\nBeyond static personalisation: similar music and cluster-based discovery It is easy to see that the LDA-based tralbum representation makes it straightforward to calculate similarity between tralbums, and also explore tralbums that belong to the same topic/cluster. Adding this functionality to BCRecommender means that users can explore similar tralbums to a tralbum or a cluster in the style that they are interested in right now – based on their mood. Implementing these features helped me find more music I like, so again, I’m happy.\nTweaking the similarity algorithms is still a work in progress, as is finding a scalable way to generate useful cluster/spotlight pages. However, my focus now (in the time that I can allocate to working on this project) is on getting some people using it and iterate following their feedback.\nFuture extensions It would be awesome to make BCRecommender’s discovery process smoother. For example, it’d be fairly straightforward to just stream all the recommendations rather than making users click album by album (like Pandora, Spotify, etc.). Iterating on the above approaches to improve the user experience is also likely to yield good results.\nHowever, as mentioned above, my current focus is on getting more people to use BCRecommender. While the target audience is rather small, it doesn’t matter because I’m not trying to make money from this project. I am certain that many fans would discover new music using the website. At this stage, I just need to get them to visit, which is something that I will write about in future posts.\n","wordCount":"1230","inLanguage":"en","datePublished":"2014-09-19T14:26:55Z","dateModified":"2023-07-06T09:28:02+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Bandcamp recommendation and discovery algorithms</h1><div class=post-meta><span title='2014-09-19 14:26:55 +0000 UTC'>September 19, 2014</span></div></header><div class=post-content><p class=intro-note>This is the third part of a series of posts on my <a href=http://www.bcrecommender.com target=_blank rel=noopener>Bandcamp recommendations (BCRecommender)</a> project. Check out <a href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/>the first part</a> for the general motivation behind this project and <a href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/>the second part</a> for the system architecture.</p><p>The main goal of the BCRecommender project is to help me find music I like. This post discusses the algorithmic approaches I took towards that goal. I&rsquo;ve kept the descriptions at a fairly high-level, without getting too much into the maths, as all recommendation algorithms essentially try to model simple intuition. Please leave a comment if you feel like something needs to be explained further.</p><h3 id=data--evaluation-approach>Data & evaluation approach<a hidden class=anchor aria-hidden=true href=#data--evaluation-approach>#</a></h3><p>The data was collected from publicly-indexable Bandcamp fan and track/album (aka tralbum) pages. For each fan, it consists of the tralbum IDs they bought or wishlisted. For each tralbum, the saved data includes the type (track/album), URL, title, artist name, and the tags (as assigned by the artist).</p><p>At the moment, I have data for about 160K fans, 335K albums and 170K tracks. These fans have expressed their preference for tralbums through purchasing or wishlisting about 3.4M times. There are about 210K unique tags across the 505K tralbums, with the mean number of tags per tralbum being 7. These figures represent a fairly sparse dataset, which makes recommendation somewhat challenging. Perhaps this is why Bandcamp doesn&rsquo;t do much algorithmic recommendation.</p><p>Before moving on to describe the recommendation approaches I played with, it is worth noting that at this stage, my way of evaluating the recommendations isn&rsquo;t very rigorous. If I can easily find new music that I like, I&rsquo;m happy. As such, offline evaluation approaches (e.g., some form of cross-validation) are unlikely to correlate well with my goal, so I just didn&rsquo;t bother with them. Having more data would allow me to perform more rigorous online evaluation to see what makes other people happy with the recommendations.</p><h3 id=personalised-recommendations-with-preferences-collaborative-filtering>Personalised recommendations with preferences (collaborative filtering)<a hidden class=anchor aria-hidden=true href=#personalised-recommendations-with-preferences-collaborative-filtering>#</a></h3><p>My first crack at recommendation generation was using <a href=https://en.wikipedia.org/wiki/Collaborative_filtering target=_blank rel=noopener>collaborative filtering</a>. The broad idea behind collaborative filtering is using only the preference matrix to find patterns in the data, and generate recommendations accordingly. The preference matrix is defined to have a row for each user and a column for each item. Each matrix element value indicates the level of preference by the user for the item. To keep things simple, I used unary preference values, where the element that corresponds to user/fan <em>u</em> and item/tralbum <em>i</em> is set to 1 if the fan purchased or wishlisted the tralbum, or set to <em>missing</em> otherwise.</p><p>A simple example for collaborative filtering is in the following image, which was taken from the <a href=https://en.wikipedia.org/wiki/Collaborative_filtering target=_blank rel=noopener>Wikipedia article on the topic</a>.</p><figure><a href=https://en.wikipedia.org/wiki/Collaborative_filtering target=_blank rel=noopener><img sizes="(min-width: 768px) 498px,
+<meta name=keywords content="Bandcamp,BCRecommender,data science,music,predictive modelling,recommender systems"><meta name=description content="The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Bandcamp recommendation and discovery algorithms"><meta property="og:description" content="The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-09-19T14:26:55+00:00"><meta property="article:modified_time" content="2023-07-06T09:28:02+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Bandcamp recommendation and discovery algorithms"><meta name=twitter:description content="The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Bandcamp recommendation and discovery algorithms","item":"https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Bandcamp recommendation and discovery algorithms","name":"Bandcamp recommendation and discovery algorithms","description":"The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.","keywords":["Bandcamp","BCRecommender","data science","music","predictive modelling","recommender systems"],"articleBody":" This is the third part of a series of posts on my Bandcamp recommendations (BCRecommender) project. Check out the first part for the general motivation behind this project and the second part for the system architecture. The main goal of the BCRecommender project is to help me find music I like. This post discusses the algorithmic approaches I took towards that goal. I’ve kept the descriptions at a fairly high-level, without getting too much into the maths, as all recommendation algorithms essentially try to model simple intuition. Please leave a comment if you feel like something needs to be explained further.\nData \u0026 evaluation approach The data was collected from publicly-indexable Bandcamp fan and track/album (aka tralbum) pages. For each fan, it consists of the tralbum IDs they bought or wishlisted. For each tralbum, the saved data includes the type (track/album), URL, title, artist name, and the tags (as assigned by the artist).\nAt the moment, I have data for about 160K fans, 335K albums and 170K tracks. These fans have expressed their preference for tralbums through purchasing or wishlisting about 3.4M times. There are about 210K unique tags across the 505K tralbums, with the mean number of tags per tralbum being 7. These figures represent a fairly sparse dataset, which makes recommendation somewhat challenging. Perhaps this is why Bandcamp doesn’t do much algorithmic recommendation.\nBefore moving on to describe the recommendation approaches I played with, it is worth noting that at this stage, my way of evaluating the recommendations isn’t very rigorous. If I can easily find new music that I like, I’m happy. As such, offline evaluation approaches (e.g., some form of cross-validation) are unlikely to correlate well with my goal, so I just didn’t bother with them. Having more data would allow me to perform more rigorous online evaluation to see what makes other people happy with the recommendations.\nPersonalised recommendations with preferences (collaborative filtering) My first crack at recommendation generation was using collaborative filtering. The broad idea behind collaborative filtering is using only the preference matrix to find patterns in the data, and generate recommendations accordingly. The preference matrix is defined to have a row for each user and a column for each item. Each matrix element value indicates the level of preference by the user for the item. To keep things simple, I used unary preference values, where the element that corresponds to user/fan u and item/tralbum i is set to 1 if the fan purchased or wishlisted the tralbum, or set to missing otherwise.\nA simple example for collaborative filtering is in the following image, which was taken from the Wikipedia article on the topic.\nI used matrix factorisation as the collaborative filtering algorithm. This algorithm was a key part of the winning team’s solution to the Netflix competition. Unsurprisingly, it didn’t work that well. The key issue is that there are 160K * (335K + 170K) = 80.8B possible preferences in the dataset, but only 3.4M (0.004%) preferences are given. What matrix factorisation tries to do is to predict the remaining 99.996% of preferences based on the tiny percentage of given data. This just didn’t yield any music recommendations I liked, even when I made the matrix denser by dropping fans and tralbums with few preferences. Therefore, I moved on to employing an algorithm that can use more data – the tags.\nPersonalised recommendations with tags and preferences (collaborative filtering and content-based hybrid) Using data about the items is referred to as content-based recommendation in the literature. In the Bandcamp recommender case, the content data that is most easy to use is the tags that artists assign to their work. The idea is to build a profile for each fan based on tags for their tralbums, and recommend tralbums with tags that match the fan’s profile.\nAs mentioned above, the dataset contains 210K unique tags for 505K tralbums, which means that this representation of the dataset is also rather sparse. One obvious way of making it denser is by dropping rare tags. I also “tagged” each tralbum with a fan’s username if that fan purchased or wishlisted the tralbum. In addition to yielding a richer tralbum representation, this approach makes the recommendations likely to be less obvious than those based only on tags. For example, all tralbums tagged with rock are likely to be rock albums, but tralbums tagged with yanir are somewhat more varied.\nTo make the tralbum representation denser I used the latent Dirichlet allocation (LDA) implementation from the excellent gensim library. LDA assumes that there’s a fixed number of topics (distributions over tags, i.e., weighted lists of tags), and that every tralbum’s tags are generated from its topics. In practice, this magically yields clusters of tags and tralbums that can be used to generate recommendations. For example, the following word cloud presents the top tags in one cluster, which is focused on psychedelic-progressive rock. Each tralbum is assigned a probability of being generated from this cluster. This means that each tralbum is now represented as a probability distribution over a fixed number of topics – much denser than the raw tag data.\nUsing LDA for generating recommendations is straightforward, as each fan can be represented as the concatenation of the tags assigned to their tralbums, together with their own user tag. This representation is then converted to a topic distribution, which is compared to all the tralbums to yield the most similar ones.\nThis approach yielded much better results than collaborative filtering. I actually found albums I like and made some purchases, more than just the three that are annotated on my fan page (I didn’t want to be too spammy). Woohoo!\nHowever, the problem with this approach is that it doesn’t take my mood into account, as it is based on my entire profile. To address this, I introduced similar music and cluster-based discovery.\nBeyond static personalisation: similar music and cluster-based discovery It is easy to see that the LDA-based tralbum representation makes it straightforward to calculate similarity between tralbums, and also explore tralbums that belong to the same topic/cluster. Adding this functionality to BCRecommender means that users can explore similar tralbums to a tralbum or a cluster in the style that they are interested in right now – based on their mood. Implementing these features helped me find more music I like, so again, I’m happy.\nTweaking the similarity algorithms is still a work in progress, as is finding a scalable way to generate useful cluster/spotlight pages. However, my focus now (in the time that I can allocate to working on this project) is on getting some people using it and iterate following their feedback.\nFuture extensions It would be awesome to make BCRecommender’s discovery process smoother. For example, it’d be fairly straightforward to just stream all the recommendations rather than making users click album by album (like Pandora, Spotify, etc.). Iterating on the above approaches to improve the user experience is also likely to yield good results.\nHowever, as mentioned above, my current focus is on getting more people to use BCRecommender. While the target audience is rather small, it doesn’t matter because I’m not trying to make money from this project. I am certain that many fans would discover new music using the website. At this stage, I just need to get them to visit, which is something that I will write about in future posts.\n","wordCount":"1230","inLanguage":"en","datePublished":"2014-09-19T14:26:55Z","dateModified":"2023-07-06T09:28:02+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Bandcamp recommendation and discovery algorithms</h1><div class=post-meta><span title='2014-09-19 14:26:55 +0000 UTC'>September 19, 2014</span></div></header><div class=post-content><p class=intro-note>This is the third part of a series of posts on my <a href=http://www.bcrecommender.com target=_blank rel=noopener>Bandcamp recommendations (BCRecommender)</a> project. Check out <a href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/>the first part</a> for the general motivation behind this project and <a href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/>the second part</a> for the system architecture.</p><p>The main goal of the BCRecommender project is to help me find music I like. This post discusses the algorithmic approaches I took towards that goal. I&rsquo;ve kept the descriptions at a fairly high-level, without getting too much into the maths, as all recommendation algorithms essentially try to model simple intuition. Please leave a comment if you feel like something needs to be explained further.</p><h3 id=data--evaluation-approach>Data & evaluation approach<a hidden class=anchor aria-hidden=true href=#data--evaluation-approach>#</a></h3><p>The data was collected from publicly-indexable Bandcamp fan and track/album (aka tralbum) pages. For each fan, it consists of the tralbum IDs they bought or wishlisted. For each tralbum, the saved data includes the type (track/album), URL, title, artist name, and the tags (as assigned by the artist).</p><p>At the moment, I have data for about 160K fans, 335K albums and 170K tracks. These fans have expressed their preference for tralbums through purchasing or wishlisting about 3.4M times. There are about 210K unique tags across the 505K tralbums, with the mean number of tags per tralbum being 7. These figures represent a fairly sparse dataset, which makes recommendation somewhat challenging. Perhaps this is why Bandcamp doesn&rsquo;t do much algorithmic recommendation.</p><p>Before moving on to describe the recommendation approaches I played with, it is worth noting that at this stage, my way of evaluating the recommendations isn&rsquo;t very rigorous. If I can easily find new music that I like, I&rsquo;m happy. As such, offline evaluation approaches (e.g., some form of cross-validation) are unlikely to correlate well with my goal, so I just didn&rsquo;t bother with them. Having more data would allow me to perform more rigorous online evaluation to see what makes other people happy with the recommendations.</p><h3 id=personalised-recommendations-with-preferences-collaborative-filtering>Personalised recommendations with preferences (collaborative filtering)<a hidden class=anchor aria-hidden=true href=#personalised-recommendations-with-preferences-collaborative-filtering>#</a></h3><p>My first crack at recommendation generation was using <a href=https://en.wikipedia.org/wiki/Collaborative_filtering target=_blank rel=noopener>collaborative filtering</a>. The broad idea behind collaborative filtering is using only the preference matrix to find patterns in the data, and generate recommendations accordingly. The preference matrix is defined to have a row for each user and a column for each item. Each matrix element value indicates the level of preference by the user for the item. To keep things simple, I used unary preference values, where the element that corresponds to user/fan <em>u</em> and item/tralbum <em>i</em> is set to 1 if the fan purchased or wishlisted the tralbum, or set to <em>missing</em> otherwise.</p><p>A simple example for collaborative filtering is in the following image, which was taken from the <a href=https://en.wikipedia.org/wiki/Collaborative_filtering target=_blank rel=noopener>Wikipedia article on the topic</a>.</p><figure><a href=https://en.wikipedia.org/wiki/Collaborative_filtering target=_blank rel=noopener><img sizes="(min-width: 768px) 498px,
 100vw" srcset="https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/collaborative-filtering.gif 498w," src=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/collaborative-filtering.gif alt="A simple collaborative filtering example" loading=lazy></a></figure><p>I used matrix factorisation as the collaborative filtering algorithm. This algorithm was a key part of <a href=https://datajobs.com/data-science-repo/Recommender-Systems-%5BNetflix%5D.pdf target=_blank rel=noopener>the winning team&rsquo;s solution to the Netflix competition</a>. Unsurprisingly, it didn&rsquo;t work that well. The key issue is that there are 160K * (335K + 170K) = 80.8B possible preferences in the dataset, but only 3.4M (0.004%) preferences are given. What matrix factorisation tries to do is to predict the remaining 99.996% of preferences based on the tiny percentage of given data. This just didn&rsquo;t yield any music recommendations I liked, even when I made the matrix denser by dropping fans and tralbums with few preferences. Therefore, I moved on to employing an algorithm that can use more data – the tags.</p><h3 id=personalised-recommendations-with-tags-and-preferences-collaborative-filtering-and-content-based-hybrid>Personalised recommendations with tags and preferences (collaborative filtering and content-based hybrid)<a hidden class=anchor aria-hidden=true href=#personalised-recommendations-with-tags-and-preferences-collaborative-filtering-and-content-based-hybrid>#</a></h3><p>Using data about the items is referred to as <a href=https://en.wikipedia.org/wiki/Recommender_system#Content-based_filtering target=_blank rel=noopener>content-based recommendation</a> in the literature. In the Bandcamp recommender case, the content data that is most easy to use is the tags that artists assign to their work. The idea is to build a profile for each fan based on tags for their tralbums, and recommend tralbums with tags that match the fan&rsquo;s profile.</p><p>As mentioned above, the dataset contains 210K unique tags for 505K tralbums, which means that this representation of the dataset is also rather sparse. One obvious way of making it denser is by dropping rare tags. I also &ldquo;tagged&rdquo; each tralbum with a fan&rsquo;s username if that fan purchased or wishlisted the tralbum. In addition to yielding a richer tralbum representation, this approach makes the recommendations likely to be less obvious than those based only on tags. For example, all tralbums tagged with <em>rock</em> are likely to be rock albums, but tralbums tagged with <em>yanir</em> are <a href=https://bandcamp.com/yanir target=_blank rel=noopener>somewhat more varied</a>.</p><p>To make the tralbum representation denser I used the <a href=http://radimrehurek.com/gensim/wiki.html#latent-dirichlet-allocation target=_blank rel=noopener>latent Dirichlet allocation (LDA) implementation from the excellent gensim library</a>. LDA assumes that there&rsquo;s a fixed number of <em>topics</em> (distributions over tags, i.e., weighted lists of tags), and that every tralbum&rsquo;s tags are generated from its topics. In practice, this <em>magically</em> yields clusters of tags and tralbums that can be used to generate recommendations. For example, the following word cloud presents the top tags in one cluster, which is focused on psychedelic-progressive rock. Each tralbum is assigned a probability of being generated from this cluster. This means that each tralbum is now represented as a probability distribution over a fixed number of topics – much denser than the raw tag data.</p><figure><a href=psychedelic-progressive-rock-tag-cloud.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/psychedelic-progressive-rock-tag-cloud_hu1f3068901702be9a79eb77586e667f5e_136685_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/psychedelic-progressive-rock-tag-cloud_hu1f3068901702be9a79eb77586e667f5e_136685_480x0_resize_box_3.png 480w,
diff --git a/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/index.html b/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/index.html
index 78d367c5c..7f8dc85b8 100644
--- a/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/index.html
+++ b/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Applying the Traction Book’s Bullseye framework to BCRecommender | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="Bandcamp,BCRecommender,business,marketing,recommender systems,traction book"><meta name=description content="Ranking 19 channels with the goal of getting traction for BCRecommender."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Applying the Traction Book’s Bullseye framework to BCRecommender"><meta property="og:description" content="Ranking 19 channels with the goal of getting traction for BCRecommender."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-09-24T04:57:39+00:00"><meta property="article:modified_time" content="2023-07-06T09:28:02+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Applying the Traction Book’s Bullseye framework to BCRecommender"><meta name=twitter:description content="Ranking 19 channels with the goal of getting traction for BCRecommender."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Applying the Traction Book’s Bullseye framework to BCRecommender","item":"https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Applying the Traction Book’s Bullseye framework to BCRecommender","name":"Applying the Traction Book’s Bullseye framework to BCRecommender","description":"Ranking 19 channels with the goal of getting traction for BCRecommender.","keywords":["Bandcamp","BCRecommender","business","marketing","recommender systems","traction book"],"articleBody":" This is the fourth part of a series of posts on my Bandcamp recommendations (BCRecommender) project. Check out previous posts on the general motivation behind this project, the system's architecture, and the recommendation algorithms. Having used BCRecommender to find music I like, I’m certain that other Bandcamp fans would like it too. It could probably be extended to attract a wider audience of music lovers, but for now, just getting feedback from Bandcamp fans would be enough. There are about 200,000 fans that I know of – getting even a fraction of them to use and comment on BCRecommender would serve as a good guide to what’s worth building and improving.\nIn addition to getting feedback, the personal value for me in getting BCRecommender users is learning some general lessons on traction building. Like many technical people, I like building products and playing with data, but I don’t really enjoy sales and marketing (and that’s an understatement). One of my goals in working independently is forcing myself to get better at the things I’m not good at. To that end, I recently started reading Traction: A Startup Guide to Getting Customers by Gabriel Weinberg and Justin Mares.\nThe Traction book identifies 19 different channels for getting traction, and suggests a simple framework (named Bullseye) to ranking and quickly exploring the channels. They explain that many technical founders tend to focus on traction channels they’re familiar with, and that the effort invested in those channels tends to be rather small compared to the investment in building the product. The authors rightly note that “Almost every failed startup has a product. What failed startups don’t have is traction – real customer growth.” They argue that following a rigorous approach to gaining traction via their framework is likely to improve a startup’s chances of success. From personal experience, this is very likely to be true.\nThe key steps in the Bullseye framework are brainstorming ideas for each traction channel, ranking the channels into tiers, prioritising the most promising ones, testing them, and focusing on the channels that work. This is not a one-off process – channel suitability changes over time, and one needs to go through the process repeatedly as the product evolves and traction grows.\nHere are the traction channels, ordered in the same order as in the book. Each traction channel is marked with a letter denoting its ranking tier from A (most appropriate) to C (unsuitable right now). A short explanation is provided for each channel.\n[B] viral marketing: everyone wants to go viral, but at the moment I don’t have a good-enough understanding of my target audience to seriously pursue this channel. [C] public relations (PR): I don’t think that PR would give me access to the kind of focused user group I need at this phase. [C] unconventional PR: same as conventional PR. [C] search engine marketing (SEM): may work, but I don’t want to spend money at this stage. [C] social and display ads: see SEM. [C] offline ads: see SEM. [A] search engine optimization (SEO): this channel seems promising, as ranking highly for queries such as “bandcamp recommendations” should drive quality traffic that is likely to convert (i.e., play recommendations and sign up for updates). It doesn’t seem like “bandcamp recommendations” is a very competitive query, so it’s definitely worth doing some SEO work. [A] content marketing: I think that there’s definitely potential in this channel, since I have a lot of data that can be explored and presented in interesting ways. The problem is creating content that is compelling enough to attract people. I started playing with this channel via the Spotlights feature, but it’s not good enough yet. [B] email marketing: BCRecommender already has the subscription feature for retention. At this stage, this doesn’t seem like a viable acquisition channel. [B] engineering as marketing: this channel sounds promising, but I don’t have good ideas for it at the moment. This may change soon, as I’m currently reading this chapter. [A] targeting blogs: this approach should work for getting high-quality feedback, and help SEO as well. [C] business development: there may be some promising ideas in this channel, but only worth pursuing later. [C] sales: not much to sell. [C] affiliate programs: I’m not going to pay affiliates as I’m not making any money. [B] existing platforms: in a way, I’m already building on top of the existing Bandcamp platform. One way of utilising it for growth is by getting fans to link to BCRecommender when it leads to sales (as I’ve done on my fan page), but that would be more feasible at a later stage with more active users. [C] trade shows: I find it hard to think of trade shows where there are many Bandcamp fans. [C] offline events: probably easier than trade shows (think concerts/indie events), but doesn’t seem worth pursuing at this stage. [C] speaking engagements: similar to offline events. I do speaking engagements, and I’m actually going to mention BCRecommender as a case study at my workshop this week, but the intersection between Bandcamp fans and people interested in data science seems rather small. [C] community building: this may be possible later on, when there is a core group of loyal users. However, some aspects of community building are provided by Bandcamp and I don’t want to compete with them. Cool, writing everything up explicitly was actually helpful! The next step is to test the three channels that ranked the highest: SEO, content marketing and targeting blogs. I will report the results in future posts.\n","wordCount":"926","inLanguage":"en","datePublished":"2014-09-24T04:57:39Z","dateModified":"2023-07-06T09:28:02+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Applying the Traction Book’s Bullseye framework to BCRecommender</h1><div class=post-meta><span title='2014-09-24 04:57:39 +0000 UTC'>September 24, 2014</span></div></header><div class=post-content><p class=intro-note>This is the fourth part of a series of posts on my <a href=http://www.bcrecommender.com target=_blank rel=noopener>Bandcamp recommendations (BCRecommender)</a> project. Check out previous posts on <a href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/>the general motivation behind this project</a>, <a href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/>the system's architecture</a>, and <a href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/>the recommendation algorithms</a>.</p><p>Having used BCRecommender to find music I like, I&rsquo;m certain that other Bandcamp fans would like it too. It could probably be extended to attract a wider audience of music lovers, but for now, just getting feedback from Bandcamp fans would be enough. There are about 200,000 fans that I know of – getting even a fraction of them to use and comment on BCRecommender would serve as a good guide to what&rsquo;s worth building and improving.</p><p>In addition to getting feedback, the personal value for me in getting BCRecommender users is learning some general lessons on traction building. Like many technical people, I like building products and playing with data, but I don&rsquo;t really enjoy sales and marketing (and that&rsquo;s an understatement). One of my goals in working independently is forcing myself to get better at the things I&rsquo;m not good at. To that end, I recently started reading <a href=http://tractionbook.com/ target=_blank rel=noopener>Traction: A Startup Guide to Getting Customers</a> by Gabriel Weinberg and Justin Mares.</p><p>The Traction book identifies 19 different channels for getting traction, and suggests a simple framework (named Bullseye) to ranking and quickly exploring the channels. They explain that many technical founders tend to focus on traction channels they&rsquo;re familiar with, and that the effort invested in those channels tends to be rather small compared to the investment in building the product. The authors rightly note that &ldquo;Almost every failed startup has a product. What failed startups don&rsquo;t have is traction – real customer growth.&rdquo; They argue that following a rigorous approach to gaining traction via their framework is likely to improve a startup&rsquo;s chances of success. From personal experience, this is very likely to be true.</p><p>The key steps in the Bullseye framework are brainstorming ideas for each traction channel, ranking the channels into tiers, prioritising the most promising ones, testing them, and focusing on the channels that work. This is not a one-off process – channel suitability changes over time, and one needs to go through the process repeatedly as the product evolves and traction grows.</p><p>Here are the traction channels, ordered in the same order as in the book. Each traction channel is marked with a letter denoting its ranking tier from A (most appropriate) to C (unsuitable right now). A short explanation is provided for each channel.</p><ul><li>[B] <strong>viral marketing:</strong> everyone wants to go viral, but at the moment I don&rsquo;t have a good-enough understanding of my target audience to seriously pursue this channel.</li><li>[C] <strong>public relations (PR):</strong> I don&rsquo;t think that PR would give me access to the kind of focused user group I need at this phase.</li><li>[C] <strong>unconventional PR:</strong> same as conventional PR.</li><li>[C] <strong>search engine marketing (SEM):</strong> may work, but I don&rsquo;t want to spend money at this stage.</li><li>[C] <strong>social and display ads:</strong> see SEM.</li><li>[C] <strong>offline ads:</strong> see SEM.</li><li>[A] <strong>search engine optimization (SEO):</strong> this channel seems promising, as ranking highly for queries such as &ldquo;bandcamp recommendations&rdquo; should drive quality traffic that is likely to convert (i.e., play recommendations and sign up for updates). It doesn&rsquo;t seem like &ldquo;bandcamp recommendations&rdquo; is a very competitive query, so it&rsquo;s definitely worth doing some SEO work.</li><li>[A] <strong>content marketing:</strong> I think that there&rsquo;s definitely potential in this channel, since I have a lot of data that can be explored and presented in interesting ways. The problem is creating content that is compelling enough to attract people. I started playing with this channel via the <a href=http://www.bcrecommender.com/spotlights target=_blank rel=noopener>Spotlights feature</a>, but it&rsquo;s not good enough yet.</li><li>[B] <strong>email marketing:</strong> BCRecommender already has the subscription feature for retention. At this stage, this doesn&rsquo;t seem like a viable acquisition channel.</li><li>[B] <strong>engineering as marketing:</strong> this channel sounds promising, but I don&rsquo;t have good ideas for it at the moment. This may change soon, as I&rsquo;m currently reading this chapter.</li><li>[A] <strong>targeting blogs:</strong> this approach should work for getting high-quality feedback, and help SEO as well.</li><li>[C] <strong>business development:</strong> there may be some promising ideas in this channel, but only worth pursuing later.</li><li>[C] <strong>sales:</strong> not much to sell.</li><li>[C] <strong>affiliate programs:</strong> I&rsquo;m not going to pay affiliates as I&rsquo;m not making any money.</li><li>[B] <strong>existing platforms:</strong> in a way, I&rsquo;m already building on top of the existing Bandcamp platform. One way of utilising it for growth is by getting fans to link to BCRecommender when it leads to sales (<a href=https://bandcamp.com/yanir target=_blank rel=noopener>as I&rsquo;ve done on my fan page</a>), but that would be more feasible at a later stage with more active users.</li><li>[C] <strong>trade shows:</strong> I find it hard to think of trade shows where there are many Bandcamp fans.</li><li>[C] <strong>offline events:</strong> probably easier than trade shows (think concerts/indie events), but doesn&rsquo;t seem worth pursuing at this stage.</li><li>[C] <strong>speaking engagements:</strong> similar to offline events. I do speaking engagements, and I&rsquo;m actually going to mention BCRecommender as a case study at <a href=https://generalassemb.ly/education/demystifying-data-an-introduction-to-data-science/sydney/7692 target=_blank rel=noopener>my workshop this week</a>, but the intersection between Bandcamp fans and people interested in data science seems rather small.</li><li>[C] <strong>community building:</strong> this may be possible later on, when there is a core group of loyal users. However, some aspects of community building are provided by Bandcamp and I don&rsquo;t want to compete with them.</li></ul><p>Cool, writing everything up explicitly was actually helpful! The next step is to test the three channels that ranked the highest: SEO, content marketing and targeting blogs. I will report the results in future posts.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/bandcamp/>Bandcamp</a></li><li><a href=https://yanirseroussi.com/tags/bcrecommender/>BCRecommender</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/recommender-systems/>Recommender Systems</a></li><li><a href=https://yanirseroussi.com/tags/traction-book/>Traction Book</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Applying the Traction Book’s Bullseye framework to BCRecommender on x" href="https://x.com/intent/tweet/?text=Applying%20the%20Traction%20Book%e2%80%99s%20Bullseye%20framework%20to%20BCRecommender&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f24%2fapplying-the-traction-books-bullseye-framework-to-bcrecommender%2f&amp;hashtags=Bandcamp%2cBCRecommender%2cbusiness%2cmarketing%2crecommendersystems%2ctractionbook"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Applying the Traction Book’s Bullseye framework to BCRecommender on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f24%2fapplying-the-traction-books-bullseye-framework-to-bcrecommender%2f&amp;title=Applying%20the%20Traction%20Book%e2%80%99s%20Bullseye%20framework%20to%20BCRecommender&amp;summary=Applying%20the%20Traction%20Book%e2%80%99s%20Bullseye%20framework%20to%20BCRecommender&amp;source=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f24%2fapplying-the-traction-books-bullseye-framework-to-bcrecommender%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Applying the Traction Book’s Bullseye framework to BCRecommender on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f24%2fapplying-the-traction-books-bullseye-framework-to-bcrecommender%2f&title=Applying%20the%20Traction%20Book%e2%80%99s%20Bullseye%20framework%20to%20BCRecommender"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Applying the Traction Book’s Bullseye framework to BCRecommender on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f24%2fapplying-the-traction-books-bullseye-framework-to-bcrecommender%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Applying the Traction Book’s Bullseye framework to BCRecommender on whatsapp" href="https://api.whatsapp.com/send?text=Applying%20the%20Traction%20Book%e2%80%99s%20Bullseye%20framework%20to%20BCRecommender%20-%20https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f24%2fapplying-the-traction-books-bullseye-framework-to-bcrecommender%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Applying the Traction Book’s Bullseye framework to BCRecommender on telegram" href="https://telegram.me/share/url?text=Applying%20the%20Traction%20Book%e2%80%99s%20Bullseye%20framework%20to%20BCRecommender&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f24%2fapplying-the-traction-books-bullseye-framework-to-bcrecommender%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Applying the Traction Book’s Bullseye framework to BCRecommender on ycombinator" href="https://news.ycombinator.com/submitlink?t=Applying%20the%20Traction%20Book%e2%80%99s%20Bullseye%20framework%20to%20BCRecommender&u=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f24%2fapplying-the-traction-books-bullseye-framework-to-bcrecommender%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="Bandcamp,BCRecommender,business,marketing,recommender systems,traction book"><meta name=description content="Ranking 19 channels with the goal of getting traction for BCRecommender."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Applying the Traction Book’s Bullseye framework to BCRecommender"><meta property="og:description" content="Ranking 19 channels with the goal of getting traction for BCRecommender."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-09-24T04:57:39+00:00"><meta property="article:modified_time" content="2023-07-06T09:28:02+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Applying the Traction Book’s Bullseye framework to BCRecommender"><meta name=twitter:description content="Ranking 19 channels with the goal of getting traction for BCRecommender."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Applying the Traction Book’s Bullseye framework to BCRecommender","item":"https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Applying the Traction Book’s Bullseye framework to BCRecommender","name":"Applying the Traction Book’s Bullseye framework to BCRecommender","description":"Ranking 19 channels with the goal of getting traction for BCRecommender.","keywords":["Bandcamp","BCRecommender","business","marketing","recommender systems","traction book"],"articleBody":" This is the fourth part of a series of posts on my Bandcamp recommendations (BCRecommender) project. Check out previous posts on the general motivation behind this project, the system's architecture, and the recommendation algorithms. Having used BCRecommender to find music I like, I’m certain that other Bandcamp fans would like it too. It could probably be extended to attract a wider audience of music lovers, but for now, just getting feedback from Bandcamp fans would be enough. There are about 200,000 fans that I know of – getting even a fraction of them to use and comment on BCRecommender would serve as a good guide to what’s worth building and improving.\nIn addition to getting feedback, the personal value for me in getting BCRecommender users is learning some general lessons on traction building. Like many technical people, I like building products and playing with data, but I don’t really enjoy sales and marketing (and that’s an understatement). One of my goals in working independently is forcing myself to get better at the things I’m not good at. To that end, I recently started reading Traction: A Startup Guide to Getting Customers by Gabriel Weinberg and Justin Mares.\nThe Traction book identifies 19 different channels for getting traction, and suggests a simple framework (named Bullseye) to ranking and quickly exploring the channels. They explain that many technical founders tend to focus on traction channels they’re familiar with, and that the effort invested in those channels tends to be rather small compared to the investment in building the product. The authors rightly note that “Almost every failed startup has a product. What failed startups don’t have is traction – real customer growth.” They argue that following a rigorous approach to gaining traction via their framework is likely to improve a startup’s chances of success. From personal experience, this is very likely to be true.\nThe key steps in the Bullseye framework are brainstorming ideas for each traction channel, ranking the channels into tiers, prioritising the most promising ones, testing them, and focusing on the channels that work. This is not a one-off process – channel suitability changes over time, and one needs to go through the process repeatedly as the product evolves and traction grows.\nHere are the traction channels, ordered in the same order as in the book. Each traction channel is marked with a letter denoting its ranking tier from A (most appropriate) to C (unsuitable right now). A short explanation is provided for each channel.\n[B] viral marketing: everyone wants to go viral, but at the moment I don’t have a good-enough understanding of my target audience to seriously pursue this channel. [C] public relations (PR): I don’t think that PR would give me access to the kind of focused user group I need at this phase. [C] unconventional PR: same as conventional PR. [C] search engine marketing (SEM): may work, but I don’t want to spend money at this stage. [C] social and display ads: see SEM. [C] offline ads: see SEM. [A] search engine optimization (SEO): this channel seems promising, as ranking highly for queries such as “bandcamp recommendations” should drive quality traffic that is likely to convert (i.e., play recommendations and sign up for updates). It doesn’t seem like “bandcamp recommendations” is a very competitive query, so it’s definitely worth doing some SEO work. [A] content marketing: I think that there’s definitely potential in this channel, since I have a lot of data that can be explored and presented in interesting ways. The problem is creating content that is compelling enough to attract people. I started playing with this channel via the Spotlights feature, but it’s not good enough yet. [B] email marketing: BCRecommender already has the subscription feature for retention. At this stage, this doesn’t seem like a viable acquisition channel. [B] engineering as marketing: this channel sounds promising, but I don’t have good ideas for it at the moment. This may change soon, as I’m currently reading this chapter. [A] targeting blogs: this approach should work for getting high-quality feedback, and help SEO as well. [C] business development: there may be some promising ideas in this channel, but only worth pursuing later. [C] sales: not much to sell. [C] affiliate programs: I’m not going to pay affiliates as I’m not making any money. [B] existing platforms: in a way, I’m already building on top of the existing Bandcamp platform. One way of utilising it for growth is by getting fans to link to BCRecommender when it leads to sales (as I’ve done on my fan page), but that would be more feasible at a later stage with more active users. [C] trade shows: I find it hard to think of trade shows where there are many Bandcamp fans. [C] offline events: probably easier than trade shows (think concerts/indie events), but doesn’t seem worth pursuing at this stage. [C] speaking engagements: similar to offline events. I do speaking engagements, and I’m actually going to mention BCRecommender as a case study at my workshop this week, but the intersection between Bandcamp fans and people interested in data science seems rather small. [C] community building: this may be possible later on, when there is a core group of loyal users. However, some aspects of community building are provided by Bandcamp and I don’t want to compete with them. Cool, writing everything up explicitly was actually helpful! The next step is to test the three channels that ranked the highest: SEO, content marketing and targeting blogs. I will report the results in future posts.\n","wordCount":"926","inLanguage":"en","datePublished":"2014-09-24T04:57:39Z","dateModified":"2023-07-06T09:28:02+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Applying the Traction Book’s Bullseye framework to BCRecommender</h1><div class=post-meta><span title='2014-09-24 04:57:39 +0000 UTC'>September 24, 2014</span></div></header><div class=post-content><p class=intro-note>This is the fourth part of a series of posts on my <a href=http://www.bcrecommender.com target=_blank rel=noopener>Bandcamp recommendations (BCRecommender)</a> project. Check out previous posts on <a href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/>the general motivation behind this project</a>, <a href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/>the system's architecture</a>, and <a href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/>the recommendation algorithms</a>.</p><p>Having used BCRecommender to find music I like, I&rsquo;m certain that other Bandcamp fans would like it too. It could probably be extended to attract a wider audience of music lovers, but for now, just getting feedback from Bandcamp fans would be enough. There are about 200,000 fans that I know of – getting even a fraction of them to use and comment on BCRecommender would serve as a good guide to what&rsquo;s worth building and improving.</p><p>In addition to getting feedback, the personal value for me in getting BCRecommender users is learning some general lessons on traction building. Like many technical people, I like building products and playing with data, but I don&rsquo;t really enjoy sales and marketing (and that&rsquo;s an understatement). One of my goals in working independently is forcing myself to get better at the things I&rsquo;m not good at. To that end, I recently started reading <a href=http://tractionbook.com/ target=_blank rel=noopener>Traction: A Startup Guide to Getting Customers</a> by Gabriel Weinberg and Justin Mares.</p><p>The Traction book identifies 19 different channels for getting traction, and suggests a simple framework (named Bullseye) to ranking and quickly exploring the channels. They explain that many technical founders tend to focus on traction channels they&rsquo;re familiar with, and that the effort invested in those channels tends to be rather small compared to the investment in building the product. The authors rightly note that &ldquo;Almost every failed startup has a product. What failed startups don&rsquo;t have is traction – real customer growth.&rdquo; They argue that following a rigorous approach to gaining traction via their framework is likely to improve a startup&rsquo;s chances of success. From personal experience, this is very likely to be true.</p><p>The key steps in the Bullseye framework are brainstorming ideas for each traction channel, ranking the channels into tiers, prioritising the most promising ones, testing them, and focusing on the channels that work. This is not a one-off process – channel suitability changes over time, and one needs to go through the process repeatedly as the product evolves and traction grows.</p><p>Here are the traction channels, ordered in the same order as in the book. Each traction channel is marked with a letter denoting its ranking tier from A (most appropriate) to C (unsuitable right now). A short explanation is provided for each channel.</p><ul><li>[B] <strong>viral marketing:</strong> everyone wants to go viral, but at the moment I don&rsquo;t have a good-enough understanding of my target audience to seriously pursue this channel.</li><li>[C] <strong>public relations (PR):</strong> I don&rsquo;t think that PR would give me access to the kind of focused user group I need at this phase.</li><li>[C] <strong>unconventional PR:</strong> same as conventional PR.</li><li>[C] <strong>search engine marketing (SEM):</strong> may work, but I don&rsquo;t want to spend money at this stage.</li><li>[C] <strong>social and display ads:</strong> see SEM.</li><li>[C] <strong>offline ads:</strong> see SEM.</li><li>[A] <strong>search engine optimization (SEO):</strong> this channel seems promising, as ranking highly for queries such as &ldquo;bandcamp recommendations&rdquo; should drive quality traffic that is likely to convert (i.e., play recommendations and sign up for updates). It doesn&rsquo;t seem like &ldquo;bandcamp recommendations&rdquo; is a very competitive query, so it&rsquo;s definitely worth doing some SEO work.</li><li>[A] <strong>content marketing:</strong> I think that there&rsquo;s definitely potential in this channel, since I have a lot of data that can be explored and presented in interesting ways. The problem is creating content that is compelling enough to attract people. I started playing with this channel via the <a href=http://www.bcrecommender.com/spotlights target=_blank rel=noopener>Spotlights feature</a>, but it&rsquo;s not good enough yet.</li><li>[B] <strong>email marketing:</strong> BCRecommender already has the subscription feature for retention. At this stage, this doesn&rsquo;t seem like a viable acquisition channel.</li><li>[B] <strong>engineering as marketing:</strong> this channel sounds promising, but I don&rsquo;t have good ideas for it at the moment. This may change soon, as I&rsquo;m currently reading this chapter.</li><li>[A] <strong>targeting blogs:</strong> this approach should work for getting high-quality feedback, and help SEO as well.</li><li>[C] <strong>business development:</strong> there may be some promising ideas in this channel, but only worth pursuing later.</li><li>[C] <strong>sales:</strong> not much to sell.</li><li>[C] <strong>affiliate programs:</strong> I&rsquo;m not going to pay affiliates as I&rsquo;m not making any money.</li><li>[B] <strong>existing platforms:</strong> in a way, I&rsquo;m already building on top of the existing Bandcamp platform. One way of utilising it for growth is by getting fans to link to BCRecommender when it leads to sales (<a href=https://bandcamp.com/yanir target=_blank rel=noopener>as I&rsquo;ve done on my fan page</a>), but that would be more feasible at a later stage with more active users.</li><li>[C] <strong>trade shows:</strong> I find it hard to think of trade shows where there are many Bandcamp fans.</li><li>[C] <strong>offline events:</strong> probably easier than trade shows (think concerts/indie events), but doesn&rsquo;t seem worth pursuing at this stage.</li><li>[C] <strong>speaking engagements:</strong> similar to offline events. I do speaking engagements, and I&rsquo;m actually going to mention BCRecommender as a case study at <a href=https://generalassemb.ly/education/demystifying-data-an-introduction-to-data-science/sydney/7692 target=_blank rel=noopener>my workshop this week</a>, but the intersection between Bandcamp fans and people interested in data science seems rather small.</li><li>[C] <strong>community building:</strong> this may be possible later on, when there is a core group of loyal users. However, some aspects of community building are provided by Bandcamp and I don&rsquo;t want to compete with them.</li></ul><p>Cool, writing everything up explicitly was actually helpful! The next step is to test the three channels that ranked the highest: SEO, content marketing and targeting blogs. I will report the results in future posts.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/bandcamp/>Bandcamp</a></li><li><a href=https://yanirseroussi.com/tags/bcrecommender/>BCRecommender</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/recommender-systems/>Recommender Systems</a></li><li><a href=https://yanirseroussi.com/tags/traction-book/>Traction Book</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Applying the Traction Book’s Bullseye framework to BCRecommender on x" href="https://x.com/intent/tweet/?text=Applying%20the%20Traction%20Book%e2%80%99s%20Bullseye%20framework%20to%20BCRecommender&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f24%2fapplying-the-traction-books-bullseye-framework-to-bcrecommender%2f&amp;hashtags=Bandcamp%2cBCRecommender%2cbusiness%2cmarketing%2crecommendersystems%2ctractionbook"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Applying the Traction Book’s Bullseye framework to BCRecommender on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f24%2fapplying-the-traction-books-bullseye-framework-to-bcrecommender%2f&amp;title=Applying%20the%20Traction%20Book%e2%80%99s%20Bullseye%20framework%20to%20BCRecommender&amp;summary=Applying%20the%20Traction%20Book%e2%80%99s%20Bullseye%20framework%20to%20BCRecommender&amp;source=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f24%2fapplying-the-traction-books-bullseye-framework-to-bcrecommender%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Applying the Traction Book’s Bullseye framework to BCRecommender on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f24%2fapplying-the-traction-books-bullseye-framework-to-bcrecommender%2f&title=Applying%20the%20Traction%20Book%e2%80%99s%20Bullseye%20framework%20to%20BCRecommender"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Applying the Traction Book’s Bullseye framework to BCRecommender on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f24%2fapplying-the-traction-books-bullseye-framework-to-bcrecommender%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Applying the Traction Book’s Bullseye framework to BCRecommender on whatsapp" href="https://api.whatsapp.com/send?text=Applying%20the%20Traction%20Book%e2%80%99s%20Bullseye%20framework%20to%20BCRecommender%20-%20https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f24%2fapplying-the-traction-books-bullseye-framework-to-bcrecommender%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Applying the Traction Book’s Bullseye framework to BCRecommender on telegram" href="https://telegram.me/share/url?text=Applying%20the%20Traction%20Book%e2%80%99s%20Bullseye%20framework%20to%20BCRecommender&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f24%2fapplying-the-traction-books-bullseye-framework-to-bcrecommender%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Applying the Traction Book’s Bullseye framework to BCRecommender on ycombinator" href="https://news.ycombinator.com/submitlink?t=Applying%20the%20Traction%20Book%e2%80%99s%20Bullseye%20framework%20to%20BCRecommender&u=https%3a%2f%2fyanirseroussi.com%2f2014%2f09%2f24%2fapplying-the-traction-books-bullseye-framework-to-bcrecommender%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/index.html b/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/index.html
index ba1cdf728..acbe277c4 100644
--- a/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/index.html
+++ b/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Greek Media Monitoring Kaggle competition: My approach | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data science,Kaggle,Kaggle competition,multi-label classification,predictive modelling"><meta name=description content="Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Greek Media Monitoring Kaggle competition: My approach"><meta property="og:description" content="Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/"><meta property="og:image" content="https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-10-07T03:21:35+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components.png"><meta name=twitter:title content="Greek Media Monitoring Kaggle competition: My approach"><meta name=twitter:description content="Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Greek Media Monitoring Kaggle competition: My approach","item":"https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Greek Media Monitoring Kaggle competition: My approach","name":"Greek Media Monitoring Kaggle competition: My approach","description":"Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.","keywords":["data science","Kaggle","Kaggle competition","multi-label classification","predictive modelling"],"articleBody":"A few months ago I participated in the Kaggle Greek Media Monitoring competition. The goal of the competition was doing multilabel classification of texts scanned from Greek print media. Despite not having much time due to travelling and other commitments, I managed to finish 6th (out of 120 teams). This post describes my approach to the problem.\nData \u0026 evaluation The data consists of articles scanned from Greek print media in May-September 2013. Due to copyright issues, the organisers didn’t make the original articles available – competitors only had access to normalised tf-idf representations of the texts. This limited the options for doing feature engineering and made it impossible to consider things like word order, but it made things somewhat simpler as the focus was on modelling due to inability to extract interesting features.\nOverall, there are about 65K texts in the training set and 35K in the test set, where the split is based on chronological ordering (i.e., the training articles were published before the test articles). Each article was manually labelled with one or more labels out of a set of 203 labels. For each test article, the goal is to infer its set of labels. Submissions were ranked using the mean F1 score.\nDespite being manually annotated, the data isn’t very clean. Issues include identical texts that have different labels, empty articles, and articles with very few words. For example, the training set includes ten “articles” with a single word. Five of these articles have the word 68839, but each of these five was given a different label. Such issues are not unusual in Kaggle competitions or in real life, but they do limit the general usefulness of the results since any model built on this data would fit some noise.\nLocal validation setup As mentioned in previous posts (How to (almost) win Kaggle competitions and Kaggle beginner tips) having a solid local validation setup is very important. It ensures you don’t waste time on weak submissions, increases confidence in the models, and avoids leaking information about how well you’re doing.\nI used the first 35K training texts for local training and the following 30K texts for validation. While the article publication dates weren’t provided, I hoped that this would mimic the competition setup, where the test dataset consists of articles that were published after the articles in the training dataset. This seemed to work, as my local results were consistent with the leaderboard results. I’m pleased to report that this setup allowed me to have the lowest number of submissions of all the top-10 teams 🙂\nThings that worked I originally wanted to use this competition to play with deep learning through Python packages such as Theano and PyLearn2. However, as this was the first time I worked on a multilabel classification problem, I got sucked into reading a lot of papers on the topic and never got around to doing deep learning. Maybe next time…\nOne of my key discoveries was that there if you define a graph where the vertices are labels and there’s an edge between two labels if they appear together in a document’s label set, then there are two main connected components of labels and several small ones with single labels (see figure below). It is possible to train a linear classifier that distinguishes between the components with very high accuracy (over 99%). This allowed me to improve performance by training different classifiers on each connected component.\nMy best submission ended up being a simple weighted linear combination of three models. All these models are hierarchical ensembles, where a linear classifier distinguishes between connected components, and the base models are trained on texts from a single connected component. These base models are:\nEnsemble of classifier chains (ECC) with linear classifiers (SGDClassifier from scikit-learn) trained for each label, using hinge loss and L1 penalty Same as 1, but with modified Huber loss A linear classifier with modified Huber loss and L1 penalty that predicts single label probabilities For each test document, each one of these base models yields a score for each label. These scores are weighted and thresholded to yield the final predictions.\nIt was interesting to learn that a relatively-simple model like ECC yields competitive results. The basic idea behind ECC is to combine different classifier chains. Each classifier chain is also an ensemble where each base classifier is trained to predict a single label. The input for each classifier in the chain depends on the output of preceding classifiers, so it encodes dependencies between labels. For example, if label 2 always appears with label 1 and the label 1 classifier precedes the label 2 classifier in the chain, the label 2 classifier is able to use this dependency information directly, which should increase its accuracy (though it is affected by misclassifications by the label 1 classifier). See Read et al.’s paper for a more in-depth explanation.\nAnother notable observation is that L1 penalty worked well, which is not too surprising when considering the fact that the dataset has 300K features and many of them are probably irrelevant to prediction (L1 penalty yields sparse models where many features get zero weight).\nThings that didn’t work As I was travelling, I didn’t have much time to work on this competition over its two final weeks (though this was a good way of passing the time on long flights). One thing that I tried was understanding some of the probabilistic classifier chain (PCC) code out there by porting it to Python, but the results were very disappointing, probably due to bugs in my code. I expected PCC to work well, especially with the extension for optimising the F-measure. Figuring out how to run the Java code would have probably been a better use of my time than porting the code to Python.\nI also played with reverse-engineering the features back to counts, but it was problematic since the feature values are normalised. It was disappointing that we weren’t at least given the bag of words representations. I also attempted to reduce the feature representation with latent Dirichlet allocation, but it didn’t perform well – possibly because I couldn’t get the correct word counts.\nConclusion Overall, this was a fun competition. Despite minor issues with the data and not having enough time to do everything I wanted to do, it was a great learning experience. From reading the summaries by the other teams, it appears that other competitors enjoyed it too. As always, I highly recommend Kaggle competitions to beginners who are trying to learn more about the field of data science and predictive modelling, and to more experienced data scientists who want to improve their skills.\n","wordCount":"1114","inLanguage":"en","image":"https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components.png","datePublished":"2014-10-07T03:21:35Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Greek Media Monitoring Kaggle competition: My approach</h1><div class=post-meta><span title='2014-10-07 03:21:35 +0000 UTC'>October 7, 2014</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components_hu4bfbbe3f9a9448d9a431640c78e486b4_93326_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components_hu4bfbbe3f9a9448d9a431640c78e486b4_93326_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components_hu4bfbbe3f9a9448d9a431640c78e486b4_93326_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components.png 769w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components.png alt width=769 height=527></figure><div class=post-content><p>A few months ago I participated in the <a href=http://www.kaggle.com/c/wise-2014 target=_blank rel=noopener>Kaggle Greek Media Monitoring competition</a>. The goal of the competition was doing <a href=https://en.wikipedia.org/wiki/Multi-label_classification target=_blank rel=noopener>multilabel classification</a> of texts scanned from Greek print media. Despite not having much time due to travelling and other commitments, I managed to finish 6th (out of 120 teams). This post describes my approach to the problem.</p><h3 id=data--evaluation>Data & evaluation<a hidden class=anchor aria-hidden=true href=#data--evaluation>#</a></h3><p>The data consists of articles scanned from Greek print media in May-September 2013. Due to copyright issues, the organisers didn&rsquo;t make the original articles available – competitors only had access to normalised <a href=https://en.wikipedia.org/wiki/Tf%E2%80%93idf target=_blank rel=noopener>tf-idf</a> representations of the texts. This limited the options for doing feature engineering and made it impossible to consider things like word order, but it made things somewhat simpler as the focus was on modelling due to inability to extract interesting features.</p><p>Overall, there are about 65K texts in the training set and 35K in the test set, where the split is based on chronological ordering (i.e., the training articles were published before the test articles). Each article was manually labelled with one or more labels out of a set of 203 labels. For each test article, the goal is to infer its set of labels. Submissions were ranked using the <a href=http://www.kaggle.com/c/wise-2014/details/evaluation target=_blank rel=noopener>mean F1 score</a>.</p><p>Despite being manually annotated, the data isn&rsquo;t very clean. Issues include identical texts that have different labels, empty articles, and articles with very few words. For example, the training set includes ten &ldquo;articles&rdquo; with a single word. Five of these articles have the word 68839, but each of these five was given a different label. Such issues are not unusual in Kaggle competitions or in real life, but they do limit the general usefulness of the results since any model built on this data would fit some noise.</p><h3 id=local-validation-setup>Local validation setup<a hidden class=anchor aria-hidden=true href=#local-validation-setup>#</a></h3><p>As mentioned in previous posts (<a href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/>How to (almost) win Kaggle competitions</a> and <a href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/>Kaggle beginner tips</a>) having a solid local validation setup is very important. It ensures you don&rsquo;t waste time on weak submissions, increases confidence in the models, and avoids leaking information about how well you&rsquo;re doing.</p><p>I used the first 35K training texts for local training and the following 30K texts for validation. While the article publication dates weren&rsquo;t provided, I hoped that this would mimic the competition setup, where the test dataset consists of articles that were published after the articles in the training dataset. This seemed to work, as my local results were consistent with the leaderboard results. I&rsquo;m pleased to report that this setup allowed me to have the lowest number of submissions of all the top-10 teams 🙂</p><h3 id=things-that-worked>Things that worked<a hidden class=anchor aria-hidden=true href=#things-that-worked>#</a></h3><p>I originally wanted to use this competition to play with deep learning through Python packages such as <a href=http://deeplearning.net/software/theano/ target=_blank rel=noopener>Theano</a> and <a href=http://deeplearning.net/software/pylearn2/ target=_blank rel=noopener>PyLearn2</a>. However, as this was the first time I worked on a multilabel classification problem, I got sucked into reading a lot of papers on the topic and never got around to doing deep learning. Maybe next time&mldr;</p><p>One of my key discoveries was that there if you define a graph where the vertices are labels and there&rsquo;s an edge between two labels if they appear together in a document&rsquo;s label set, then there are two main connected components of labels and several small ones with single labels (see figure below). It is possible to train a linear classifier that distinguishes between the components with very high accuracy (over 99%). This allowed me to improve performance by training different classifiers on each connected component.</p><figure><a href=wise2014-connected-components.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="data science,Kaggle,Kaggle competition,multi-label classification,predictive modelling"><meta name=description content="Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Greek Media Monitoring Kaggle competition: My approach"><meta property="og:description" content="Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/"><meta property="og:image" content="https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-10-07T03:21:35+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components.png"><meta name=twitter:title content="Greek Media Monitoring Kaggle competition: My approach"><meta name=twitter:description content="Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Greek Media Monitoring Kaggle competition: My approach","item":"https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Greek Media Monitoring Kaggle competition: My approach","name":"Greek Media Monitoring Kaggle competition: My approach","description":"Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.","keywords":["data science","Kaggle","Kaggle competition","multi-label classification","predictive modelling"],"articleBody":"A few months ago I participated in the Kaggle Greek Media Monitoring competition. The goal of the competition was doing multilabel classification of texts scanned from Greek print media. Despite not having much time due to travelling and other commitments, I managed to finish 6th (out of 120 teams). This post describes my approach to the problem.\nData \u0026 evaluation The data consists of articles scanned from Greek print media in May-September 2013. Due to copyright issues, the organisers didn’t make the original articles available – competitors only had access to normalised tf-idf representations of the texts. This limited the options for doing feature engineering and made it impossible to consider things like word order, but it made things somewhat simpler as the focus was on modelling due to inability to extract interesting features.\nOverall, there are about 65K texts in the training set and 35K in the test set, where the split is based on chronological ordering (i.e., the training articles were published before the test articles). Each article was manually labelled with one or more labels out of a set of 203 labels. For each test article, the goal is to infer its set of labels. Submissions were ranked using the mean F1 score.\nDespite being manually annotated, the data isn’t very clean. Issues include identical texts that have different labels, empty articles, and articles with very few words. For example, the training set includes ten “articles” with a single word. Five of these articles have the word 68839, but each of these five was given a different label. Such issues are not unusual in Kaggle competitions or in real life, but they do limit the general usefulness of the results since any model built on this data would fit some noise.\nLocal validation setup As mentioned in previous posts (How to (almost) win Kaggle competitions and Kaggle beginner tips) having a solid local validation setup is very important. It ensures you don’t waste time on weak submissions, increases confidence in the models, and avoids leaking information about how well you’re doing.\nI used the first 35K training texts for local training and the following 30K texts for validation. While the article publication dates weren’t provided, I hoped that this would mimic the competition setup, where the test dataset consists of articles that were published after the articles in the training dataset. This seemed to work, as my local results were consistent with the leaderboard results. I’m pleased to report that this setup allowed me to have the lowest number of submissions of all the top-10 teams 🙂\nThings that worked I originally wanted to use this competition to play with deep learning through Python packages such as Theano and PyLearn2. However, as this was the first time I worked on a multilabel classification problem, I got sucked into reading a lot of papers on the topic and never got around to doing deep learning. Maybe next time…\nOne of my key discoveries was that there if you define a graph where the vertices are labels and there’s an edge between two labels if they appear together in a document’s label set, then there are two main connected components of labels and several small ones with single labels (see figure below). It is possible to train a linear classifier that distinguishes between the components with very high accuracy (over 99%). This allowed me to improve performance by training different classifiers on each connected component.\nMy best submission ended up being a simple weighted linear combination of three models. All these models are hierarchical ensembles, where a linear classifier distinguishes between connected components, and the base models are trained on texts from a single connected component. These base models are:\nEnsemble of classifier chains (ECC) with linear classifiers (SGDClassifier from scikit-learn) trained for each label, using hinge loss and L1 penalty Same as 1, but with modified Huber loss A linear classifier with modified Huber loss and L1 penalty that predicts single label probabilities For each test document, each one of these base models yields a score for each label. These scores are weighted and thresholded to yield the final predictions.\nIt was interesting to learn that a relatively-simple model like ECC yields competitive results. The basic idea behind ECC is to combine different classifier chains. Each classifier chain is also an ensemble where each base classifier is trained to predict a single label. The input for each classifier in the chain depends on the output of preceding classifiers, so it encodes dependencies between labels. For example, if label 2 always appears with label 1 and the label 1 classifier precedes the label 2 classifier in the chain, the label 2 classifier is able to use this dependency information directly, which should increase its accuracy (though it is affected by misclassifications by the label 1 classifier). See Read et al.’s paper for a more in-depth explanation.\nAnother notable observation is that L1 penalty worked well, which is not too surprising when considering the fact that the dataset has 300K features and many of them are probably irrelevant to prediction (L1 penalty yields sparse models where many features get zero weight).\nThings that didn’t work As I was travelling, I didn’t have much time to work on this competition over its two final weeks (though this was a good way of passing the time on long flights). One thing that I tried was understanding some of the probabilistic classifier chain (PCC) code out there by porting it to Python, but the results were very disappointing, probably due to bugs in my code. I expected PCC to work well, especially with the extension for optimising the F-measure. Figuring out how to run the Java code would have probably been a better use of my time than porting the code to Python.\nI also played with reverse-engineering the features back to counts, but it was problematic since the feature values are normalised. It was disappointing that we weren’t at least given the bag of words representations. I also attempted to reduce the feature representation with latent Dirichlet allocation, but it didn’t perform well – possibly because I couldn’t get the correct word counts.\nConclusion Overall, this was a fun competition. Despite minor issues with the data and not having enough time to do everything I wanted to do, it was a great learning experience. From reading the summaries by the other teams, it appears that other competitors enjoyed it too. As always, I highly recommend Kaggle competitions to beginners who are trying to learn more about the field of data science and predictive modelling, and to more experienced data scientists who want to improve their skills.\n","wordCount":"1114","inLanguage":"en","image":"https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components.png","datePublished":"2014-10-07T03:21:35Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Greek Media Monitoring Kaggle competition: My approach</h1><div class=post-meta><span title='2014-10-07 03:21:35 +0000 UTC'>October 7, 2014</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components_hu4bfbbe3f9a9448d9a431640c78e486b4_93326_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components_hu4bfbbe3f9a9448d9a431640c78e486b4_93326_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components_hu4bfbbe3f9a9448d9a431640c78e486b4_93326_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components.png 769w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components.png alt width=769 height=527></figure><div class=post-content><p>A few months ago I participated in the <a href=http://www.kaggle.com/c/wise-2014 target=_blank rel=noopener>Kaggle Greek Media Monitoring competition</a>. The goal of the competition was doing <a href=https://en.wikipedia.org/wiki/Multi-label_classification target=_blank rel=noopener>multilabel classification</a> of texts scanned from Greek print media. Despite not having much time due to travelling and other commitments, I managed to finish 6th (out of 120 teams). This post describes my approach to the problem.</p><h3 id=data--evaluation>Data & evaluation<a hidden class=anchor aria-hidden=true href=#data--evaluation>#</a></h3><p>The data consists of articles scanned from Greek print media in May-September 2013. Due to copyright issues, the organisers didn&rsquo;t make the original articles available – competitors only had access to normalised <a href=https://en.wikipedia.org/wiki/Tf%E2%80%93idf target=_blank rel=noopener>tf-idf</a> representations of the texts. This limited the options for doing feature engineering and made it impossible to consider things like word order, but it made things somewhat simpler as the focus was on modelling due to inability to extract interesting features.</p><p>Overall, there are about 65K texts in the training set and 35K in the test set, where the split is based on chronological ordering (i.e., the training articles were published before the test articles). Each article was manually labelled with one or more labels out of a set of 203 labels. For each test article, the goal is to infer its set of labels. Submissions were ranked using the <a href=http://www.kaggle.com/c/wise-2014/details/evaluation target=_blank rel=noopener>mean F1 score</a>.</p><p>Despite being manually annotated, the data isn&rsquo;t very clean. Issues include identical texts that have different labels, empty articles, and articles with very few words. For example, the training set includes ten &ldquo;articles&rdquo; with a single word. Five of these articles have the word 68839, but each of these five was given a different label. Such issues are not unusual in Kaggle competitions or in real life, but they do limit the general usefulness of the results since any model built on this data would fit some noise.</p><h3 id=local-validation-setup>Local validation setup<a hidden class=anchor aria-hidden=true href=#local-validation-setup>#</a></h3><p>As mentioned in previous posts (<a href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/>How to (almost) win Kaggle competitions</a> and <a href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/>Kaggle beginner tips</a>) having a solid local validation setup is very important. It ensures you don&rsquo;t waste time on weak submissions, increases confidence in the models, and avoids leaking information about how well you&rsquo;re doing.</p><p>I used the first 35K training texts for local training and the following 30K texts for validation. While the article publication dates weren&rsquo;t provided, I hoped that this would mimic the competition setup, where the test dataset consists of articles that were published after the articles in the training dataset. This seemed to work, as my local results were consistent with the leaderboard results. I&rsquo;m pleased to report that this setup allowed me to have the lowest number of submissions of all the top-10 teams 🙂</p><h3 id=things-that-worked>Things that worked<a hidden class=anchor aria-hidden=true href=#things-that-worked>#</a></h3><p>I originally wanted to use this competition to play with deep learning through Python packages such as <a href=http://deeplearning.net/software/theano/ target=_blank rel=noopener>Theano</a> and <a href=http://deeplearning.net/software/pylearn2/ target=_blank rel=noopener>PyLearn2</a>. However, as this was the first time I worked on a multilabel classification problem, I got sucked into reading a lot of papers on the topic and never got around to doing deep learning. Maybe next time&mldr;</p><p>One of my key discoveries was that there if you define a graph where the vertices are labels and there&rsquo;s an edge between two labels if they appear together in a document&rsquo;s label set, then there are two main connected components of labels and several small ones with single labels (see figure below). It is possible to train a linear classifier that distinguishes between the components with very high accuracy (over 99%). This allowed me to improve performance by training different classifiers on each connected component.</p><figure><a href=wise2014-connected-components.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components_hu4bfbbe3f9a9448d9a431640c78e486b4_93326_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components_hu4bfbbe3f9a9448d9a431640c78e486b4_93326_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/wise2014-connected-components_hu4bfbbe3f9a9448d9a431640c78e486b4_93326_720x0_resize_box_3.png 720w,
diff --git a/2014/10/23/what-is-data-science/index.html b/2014/10/23/what-is-data-science/index.html
index ae5600455..7dae8b40e 100644
--- a/2014/10/23/what-is-data-science/index.html
+++ b/2014/10/23/what-is-data-science/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>What is data science? | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data science,Kaggle,software engineering"><meta name=description content="Data science has been a hot term in the past few years. Still, there isn&rsquo;t a single definition of the field. This post discusses my favourite definition."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/10/23/what-is-data-science/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/10/23/what-is-data-science/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="What is data science?"><meta property="og:description" content="Data science has been a hot term in the past few years. Still, there isn&rsquo;t a single definition of the field. This post discusses my favourite definition."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/10/23/what-is-data-science/"><meta property="og:image" content="https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-10-23T03:22:08+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum.png"><meta name=twitter:title content="What is data science?"><meta name=twitter:description content="Data science has been a hot term in the past few years. Still, there isn&rsquo;t a single definition of the field. This post discusses my favourite definition."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"What is data science?","item":"https://yanirseroussi.com/2014/10/23/what-is-data-science/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"What is data science?","name":"What is data science?","description":"Data science has been a hot term in the past few years. Still, there isn\u0026rsquo;t a single definition of the field. This post discusses my favourite definition.","keywords":["data science","Kaggle","software engineering"],"articleBody":"Data science has been a hot term in the past few years. Despite this fact (or perhaps because of it), it still seems like there isn't a single unifying definition of data science. This post discusses my favourite definition.\nData Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.\n— Josh Wills (@josh_wills) May 3, 2012\nOne of my reasons for doing a PhD was wanting to do something more interesting than “vanilla” software engineering. When I was in the final stages of my PhD, I started going to meetups to see what’s changed in the world outside academia. Back then, I defined myself as a “software engineer with a research background”, which didn’t mean much to most people. My first post-PhD job ended up being a data scientist at a small startup. As soon as I changed my LinkedIn title to Data Scientist, many offers started flowing. This is probably the reason why so many people call themselves data scientists these days, often diluting the term to a point where it’s so broad it becomes meaningless. This post presents my preferred data science definitions and my opinions on who should or shouldn’t call themselves a data scientist.\nDefining data science I really like the definition quoted above, of data science as the intersection of software engineering and statistics. Ofer Mendelevitch goes into more detail, drawing a continuum of professions that ranges from software engineer on the left to pure statistician (or machine learning researcher) on the right.\nThis continuum contains two additional roles, which are often confused with data scientists:\nData engineer: a software engineer that deals with data plumbing (traditional database setup, Hadoop, Spark and all the rest) Data analyst: a person who digs into data to surface insights, but lacks the skills to do so at scale (e.g., they know how to use Excel, Tableau and SQL but can’t build a web app from scratch) Data science mixes all these roles. Because of this, there are few true data science positions for people with no work experience. A successful data scientist needs to be able to “become one with the data” by exploring it and applying rigorous statistical analysis (right-hand side of the continuum). But good data scientists also understand what it takes to deploy production systems, and are ready to get their hands dirty by writing code that cleans up the data or performs core system functionality (left-hand side of the continuum). Gaining all these skills takes time. It is still somewhat rare to find people who are true data scientists according to this definition, which is why Ofer Mendelevitch’s post recommends building teams that consist of people with skills from both sides of the continuum.\nHow is data science different from just science? Data is everywhere. Extracting knowledge from data is an essential part of any science. Hence, the name data science doesn’t really capture what’s new about the field. The way I see it, the novelty of data science comes from the application of software to model any type of data in a way that generalises across domains. So while a physicist may use software to build models based on data, they won’t become a data scientist until they’ve gone and applied these skills to other fields (as many physicists end up doing). As Kaggle shows, data scientists can work on a wide variety of problems – from biology and physics to marketing, text mining and web search personalisation. It’s often the case in Kaggle competitions that the same people apply similar techniques to very different problems, obtaining results that significantly improve on the state of the art.\nHowever, domain experts such as physicists aren’t going to be made redundant any time soon. Contrary to what Kaggle may have you believe, there is much more to data science than predictive modelling on a well-defined problem. Data scientists typically spend much of their time working with domain experts to define the problem, and chasing down diverse data sources to extract features that enable predictive modelling (also known as “the fun part”). Despite the existence of these less-glamorous aspects of data science, there’s still a lot of fun to be had working in the area. I highly recommend getting into data science to people who enjoy such challenges.\nGetting started as a data scientist is actually pretty simple: become a software engineer, become a data analyst, learn how to model data using software (e.g., by participating in Kaggle competitions), and find a job as a data scientist. Obviously, it’s not going to happen overnight. It took me around 10 ten years, and I’m still learning.\n","wordCount":"780","inLanguage":"en","image":"https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum.png","datePublished":"2014-10-23T03:22:08Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/10/23/what-is-data-science/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">What is data science?</h1><div class=post-meta><span title='2014-10-23 03:22:08 +0000 UTC'>October 23, 2014</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum_hude8f4ba53ab678a51f562b1a637a59bc_5172_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum_hude8f4ba53ab678a51f562b1a637a59bc_5172_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum_hude8f4ba53ab678a51f562b1a637a59bc_5172_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum.png 981w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum.png alt width=981 height=100></figure><div class=post-content><p class=intro-note>Data science has been a hot term in the past few years. Despite this fact (or perhaps because of it), it still seems like there isn't a single unifying definition of data science. This post discusses my favourite definition.</p><blockquote><p>Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.</p><p>— Josh Wills (@josh_wills) <a href=https://twitter.com/josh_wills/status/198093512149958656 target=_blank rel=noopener>May 3, 2012</a></p></blockquote><p>One of my reasons for doing a PhD was wanting to do something more interesting than &ldquo;vanilla&rdquo; software engineering. When I was in the final stages of my PhD, I started going to meetups to see what&rsquo;s changed in the world outside academia. Back then, I defined myself as a &ldquo;software engineer with a research background&rdquo;, which didn&rsquo;t mean much to most people. My first post-PhD job ended up being a data scientist at a small startup. As soon as I changed my LinkedIn title to Data Scientist, many offers started flowing. This is probably the reason why so many people call themselves data scientists these days, often diluting the term to a point where it&rsquo;s so broad it becomes meaningless. This post presents my preferred data science definitions and my opinions on who should or shouldn&rsquo;t call themselves a data scientist.</p><h3 id=defining-data-science>Defining data science<a hidden class=anchor aria-hidden=true href=#defining-data-science>#</a></h3><p>I really like the definition quoted above, of data science as <em>the intersection of software engineering and statistics</em>. <a href=http://hortonworks.com/blog/hortonworks-hadoop-data-science/ target=_blank rel=noopener>Ofer Mendelevitch</a> goes into more detail, drawing a continuum of professions that ranges from software engineer on the left to pure statistician (or machine learning researcher) on the right.</p><figure><a href=data-skill-continuum.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="data science,Kaggle,software engineering"><meta name=description content="Data science has been a hot term in the past few years. Still, there isn&rsquo;t a single definition of the field. This post discusses my favourite definition."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/10/23/what-is-data-science/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/10/23/what-is-data-science/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="What is data science?"><meta property="og:description" content="Data science has been a hot term in the past few years. Still, there isn&rsquo;t a single definition of the field. This post discusses my favourite definition."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/10/23/what-is-data-science/"><meta property="og:image" content="https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-10-23T03:22:08+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum.png"><meta name=twitter:title content="What is data science?"><meta name=twitter:description content="Data science has been a hot term in the past few years. Still, there isn&rsquo;t a single definition of the field. This post discusses my favourite definition."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"What is data science?","item":"https://yanirseroussi.com/2014/10/23/what-is-data-science/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"What is data science?","name":"What is data science?","description":"Data science has been a hot term in the past few years. Still, there isn\u0026rsquo;t a single definition of the field. This post discusses my favourite definition.","keywords":["data science","Kaggle","software engineering"],"articleBody":"Data science has been a hot term in the past few years. Despite this fact (or perhaps because of it), it still seems like there isn't a single unifying definition of data science. This post discusses my favourite definition.\nData Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.\n— Josh Wills (@josh_wills) May 3, 2012\nOne of my reasons for doing a PhD was wanting to do something more interesting than “vanilla” software engineering. When I was in the final stages of my PhD, I started going to meetups to see what’s changed in the world outside academia. Back then, I defined myself as a “software engineer with a research background”, which didn’t mean much to most people. My first post-PhD job ended up being a data scientist at a small startup. As soon as I changed my LinkedIn title to Data Scientist, many offers started flowing. This is probably the reason why so many people call themselves data scientists these days, often diluting the term to a point where it’s so broad it becomes meaningless. This post presents my preferred data science definitions and my opinions on who should or shouldn’t call themselves a data scientist.\nDefining data science I really like the definition quoted above, of data science as the intersection of software engineering and statistics. Ofer Mendelevitch goes into more detail, drawing a continuum of professions that ranges from software engineer on the left to pure statistician (or machine learning researcher) on the right.\nThis continuum contains two additional roles, which are often confused with data scientists:\nData engineer: a software engineer that deals with data plumbing (traditional database setup, Hadoop, Spark and all the rest) Data analyst: a person who digs into data to surface insights, but lacks the skills to do so at scale (e.g., they know how to use Excel, Tableau and SQL but can’t build a web app from scratch) Data science mixes all these roles. Because of this, there are few true data science positions for people with no work experience. A successful data scientist needs to be able to “become one with the data” by exploring it and applying rigorous statistical analysis (right-hand side of the continuum). But good data scientists also understand what it takes to deploy production systems, and are ready to get their hands dirty by writing code that cleans up the data or performs core system functionality (left-hand side of the continuum). Gaining all these skills takes time. It is still somewhat rare to find people who are true data scientists according to this definition, which is why Ofer Mendelevitch’s post recommends building teams that consist of people with skills from both sides of the continuum.\nHow is data science different from just science? Data is everywhere. Extracting knowledge from data is an essential part of any science. Hence, the name data science doesn’t really capture what’s new about the field. The way I see it, the novelty of data science comes from the application of software to model any type of data in a way that generalises across domains. So while a physicist may use software to build models based on data, they won’t become a data scientist until they’ve gone and applied these skills to other fields (as many physicists end up doing). As Kaggle shows, data scientists can work on a wide variety of problems – from biology and physics to marketing, text mining and web search personalisation. It’s often the case in Kaggle competitions that the same people apply similar techniques to very different problems, obtaining results that significantly improve on the state of the art.\nHowever, domain experts such as physicists aren’t going to be made redundant any time soon. Contrary to what Kaggle may have you believe, there is much more to data science than predictive modelling on a well-defined problem. Data scientists typically spend much of their time working with domain experts to define the problem, and chasing down diverse data sources to extract features that enable predictive modelling (also known as “the fun part”). Despite the existence of these less-glamorous aspects of data science, there’s still a lot of fun to be had working in the area. I highly recommend getting into data science to people who enjoy such challenges.\nGetting started as a data scientist is actually pretty simple: become a software engineer, become a data analyst, learn how to model data using software (e.g., by participating in Kaggle competitions), and find a job as a data scientist. Obviously, it’s not going to happen overnight. It took me around 10 ten years, and I’m still learning.\n","wordCount":"780","inLanguage":"en","image":"https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum.png","datePublished":"2014-10-23T03:22:08Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/10/23/what-is-data-science/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">What is data science?</h1><div class=post-meta><span title='2014-10-23 03:22:08 +0000 UTC'>October 23, 2014</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum_hude8f4ba53ab678a51f562b1a637a59bc_5172_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum_hude8f4ba53ab678a51f562b1a637a59bc_5172_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum_hude8f4ba53ab678a51f562b1a637a59bc_5172_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum.png 981w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum.png alt width=981 height=100></figure><div class=post-content><p class=intro-note>Data science has been a hot term in the past few years. Despite this fact (or perhaps because of it), it still seems like there isn't a single unifying definition of data science. This post discusses my favourite definition.</p><blockquote><p>Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.</p><p>— Josh Wills (@josh_wills) <a href=https://twitter.com/josh_wills/status/198093512149958656 target=_blank rel=noopener>May 3, 2012</a></p></blockquote><p>One of my reasons for doing a PhD was wanting to do something more interesting than &ldquo;vanilla&rdquo; software engineering. When I was in the final stages of my PhD, I started going to meetups to see what&rsquo;s changed in the world outside academia. Back then, I defined myself as a &ldquo;software engineer with a research background&rdquo;, which didn&rsquo;t mean much to most people. My first post-PhD job ended up being a data scientist at a small startup. As soon as I changed my LinkedIn title to Data Scientist, many offers started flowing. This is probably the reason why so many people call themselves data scientists these days, often diluting the term to a point where it&rsquo;s so broad it becomes meaningless. This post presents my preferred data science definitions and my opinions on who should or shouldn&rsquo;t call themselves a data scientist.</p><h3 id=defining-data-science>Defining data science<a hidden class=anchor aria-hidden=true href=#defining-data-science>#</a></h3><p>I really like the definition quoted above, of data science as <em>the intersection of software engineering and statistics</em>. <a href=http://hortonworks.com/blog/hortonworks-hadoop-data-science/ target=_blank rel=noopener>Ofer Mendelevitch</a> goes into more detail, drawing a continuum of professions that ranges from software engineer on the left to pure statistician (or machine learning researcher) on the right.</p><figure><a href=data-skill-continuum.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum_hude8f4ba53ab678a51f562b1a637a59bc_5172_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum_hude8f4ba53ab678a51f562b1a637a59bc_5172_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2014/10/23/what-is-data-science/data-skill-continuum_hude8f4ba53ab678a51f562b1a637a59bc_5172_720x0_resize_box_3.png 720w,
diff --git a/2014/11/05/bcrecommender-traction-update/index.html b/2014/11/05/bcrecommender-traction-update/index.html
index 80567123e..8f67704d8 100644
--- a/2014/11/05/bcrecommender-traction-update/index.html
+++ b/2014/11/05/bcrecommender-traction-update/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>BCRecommender Traction Update | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="Bandcamp,BCRecommender,business,marketing,music,traction book"><meta name=description content="Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="BCRecommender Traction Update"><meta property="og:description" content="Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/"><meta property="og:image" content="https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/bullseye.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-11-05T02:29:35+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/bullseye.png"><meta name=twitter:title content="BCRecommender Traction Update"><meta name=twitter:description content="Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"BCRecommender Traction Update","item":"https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"BCRecommender Traction Update","name":"BCRecommender Traction Update","description":"Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing.","keywords":["Bandcamp","BCRecommender","business","marketing","music","traction book"],"articleBody":" This is the fifth part of a series of posts on my Bandcamp recommendations (BCRecommender) project. Check out previous posts on the general motivation behind this project, the system’s architecture, the recommendation algorithms, and initial traction planning. In a previous post, I discussed my plans to apply the Bullseye framework from the Traction Book to BCRecommender, my Bandcamp recommendations project. In that post, I reviewed the 19 traction channels described in the book, and decided to focus on the three most promising ones: blogger outreach, search engine optimisation (SEO), and content marketing. This post discusses my progress to date.\nGoals My initial traction goals were rather modest: get some feedback from real people, build up steady nonzero traffic to the site, and then increase that traffic to 10+ unique visitors per day. It’s worth noting that I have four other main areas of focus at the moment, so BCRecommender is not getting all the attention I could potentially give it. Nonetheless, I have made good progress on achieving my goals (first two have been obtained, but traffic still fluctuates), and learnt a lot in the process.\nThings that worked Blogger outreach. The most obvious people to contact are existing Bandcamp fans. It was straightforward to generate a list of prolific fans with blogs, as Bandcamp allows people to populate their profile with a short bio and links to their sites. I worked my way through part of the list, sending each fan an email introducing BCRecommender and asking for their feedback. Each email required some manual work, as the vast majority of people don’t have their email address listed on their Bandcamp profile page. I was careful not to be too spammy, which seemed to work: about 50% of the people I contacted visited BCRecommender, 20% responded with positive feedback, and 10% linked to BCRecommender in some form, with the largest volume of traffic coming from my Hypebot guest post. The problem with this approach is that it doesn’t scale, but the most valuable thing I got out of it was that people like the project and that there’s a real need for it.\nTwitter. I’m not sure where Twitter falls as a traction channel. It’s probably somewhere between (micro)blogger outreach and content marketing. However you categorise Twitter, it has been working well as a source of traffic. Simply finding people who may be interested in BCRecommender and tweeting related content has proven to be a rather low-effort way of getting attention, which is great at this stage. I have a few ideas for driving more traffic from Twitter, which I will try as I go.\nThings that didn’t work Content marketing. I haven’t really spent time doing serious content marketing apart from the Spotlights pilot. My vision for the spotlights was to generate quality articles automatically and showcase music on Bandcamp in an engaging way that helps people discover new artists, even if they don’t have a fan account. However, full automation of the spotlight feature would require a lot of work, and I think that there are lower-hanging fruits that I should focus on first. For example, finding interesting insights in the data and presenting them in an engaging way may be a better content strategy, as it would be unique to BCRecommender. For the spotlights, partnering with bloggers to write the articles may be a better approach than automation.\nSEO. I expected BCRecommender to rank higher for “bandcamp recommendations” by now, as a result of my blogger outreach efforts. At the moment, it’s still on the second page for this query on Google, though it’s the first result on Bing and DuckDuckGo. Obviously, “bandcamp recommendations” is not the only query worth ranking for, but it’s very relevant to BCRecommender, and not too competitive (half of the first page results are old forum posts). One encouraging outcome from the work done so far is that my Hypebot guest post does appear on the first page. Nonetheless, I’m still interested in getting more search engine traffic. Ranking higher would probably require adding more relevant content on the site and getting more quality links (basically what SEO is all about).\nPoints to improve and next steps I could definitely do better work on all of the above channels. Contrary to what’s suggested by the Bullseye framework, I would like to put more effort into the channels that didn’t work well. The reason is that I think they didn’t work well because of lack of attention and weak experiments, rather than due to their unsuitability to BCRecommender.\nAs mentioned above, my main limiting factor is a lack of time to spend on the project. However, there’s no pressing need to hit certain traction milestones by a specific deadline. My stretch goals are to get all Bandcamp fans to check out the project (hundreds of thousands of people), and have a significant portion of them convert by signing up to updates (tens of thousands of people). Getting there will take time. So far I’m finding the process educational and enjoyable, which is a pleasant surprise.\n","wordCount":"843","inLanguage":"en","image":"https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/bullseye.png","datePublished":"2014-11-05T02:29:35Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">BCRecommender Traction Update</h1><div class=post-meta><span title='2014-11-05 02:29:35 +0000 UTC'>November 5, 2014</span></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/bullseye.png alt></figure><div class=post-content><p class=intro-note>This is the fifth part of a series of posts on my <a href=http://www.bcrecommender.com target=_blank rel=noopener>Bandcamp recommendations (BCRecommender)</a> project. Check out previous posts on <a href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/>the general motivation behind this project</a>, <a href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/>the system’s architecture</a>, <a href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/>the recommendation algorithms</a>, and <a title="Applying the Traction Book’s Bullseye framework to BCRecommender" href=https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/>initial traction planning</a>.</p><p>In a previous post, I discussed <a href=https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/ title="Applying the Traction Book’s Bullseye framework to BCRecommender">my plans to apply the Bullseye framework from the Traction Book</a> to BCRecommender, my <a href=http://www.bcrecommender.com target=_blank rel=noopener>Bandcamp recommendations</a> project. In that post, I reviewed the 19 traction channels described in the book, and decided to focus on the three most promising ones: blogger outreach, search engine optimisation (SEO), and content marketing. This post discusses my progress to date.</p><h3 id=goals>Goals<a hidden class=anchor aria-hidden=true href=#goals>#</a></h3><p>My initial traction goals were rather modest: get some feedback from real people, build up steady nonzero traffic to the site, and then increase that traffic to 10+ unique visitors per day. It&rsquo;s worth noting that I have four other main areas of focus at the moment, so BCRecommender is not getting all the attention I could potentially give it. Nonetheless, I have made good progress on achieving my goals (first two have been obtained, but traffic still fluctuates), and learnt a lot in the process.</p><h3 id=things-that-worked>Things that worked<a hidden class=anchor aria-hidden=true href=#things-that-worked>#</a></h3><p><strong>Blogger outreach.</strong> The most obvious people to contact are existing Bandcamp fans. It was straightforward to generate a list of prolific fans with blogs, as Bandcamp allows people to populate their profile with a short bio and links to their sites. I worked my way through part of the list, sending each fan an email introducing BCRecommender and asking for their feedback. Each email required some manual work, as the vast majority of people don&rsquo;t have their email address listed on their Bandcamp profile page. I was careful not to be too spammy, which seemed to work: about 50% of the people I contacted visited BCRecommender, 20% responded with positive feedback, and 10% linked to BCRecommender in some form, with the largest volume of traffic coming from my <a href=http://www.hypebot.com/hypebot/2014/10/personalized-bandcamp-recommendations-with-bcrecommender.html target=_blank rel=noopener>Hypebot guest post</a>. The problem with this approach is that it doesn&rsquo;t scale, but the most valuable thing I got out of it was that people like the project and that there&rsquo;s a real need for it.</p><p><strong>Twitter.</strong> I&rsquo;m not sure where Twitter falls as a traction channel. It&rsquo;s probably somewhere between (micro)blogger outreach and content marketing. However you categorise Twitter, it has been working well as a source of traffic. Simply finding people who may be interested in BCRecommender and tweeting related content has proven to be a rather low-effort way of getting attention, which is great at this stage. I have a few ideas for driving more traffic from Twitter, which I will try as I go.</p><h3 id=things-that-didnt-work>Things that didn&rsquo;t work<a hidden class=anchor aria-hidden=true href=#things-that-didnt-work>#</a></h3><p><strong>Content marketing.</strong> I haven&rsquo;t really spent time doing serious content marketing apart from the <a href=http://www.bcrecommender.com/spotlights target=_blank rel=noopener>Spotlights</a> pilot. My vision for the spotlights was to generate quality articles automatically and showcase music on Bandcamp in an engaging way that helps people discover new artists, even if they don&rsquo;t have a fan account. However, full automation of the spotlight feature would require a lot of work, and I think that there are lower-hanging fruits that I should focus on first. For example, finding interesting insights in the data and presenting them in an engaging way may be a better content strategy, as it would be unique to BCRecommender. For the spotlights, partnering with bloggers to write the articles may be a better approach than automation.</p><p><strong>SEO.</strong> I expected BCRecommender to rank higher for &ldquo;bandcamp recommendations&rdquo; by now, as a result of my blogger outreach efforts. At the moment, it&rsquo;s still on the second page for this query on Google, though it&rsquo;s the first result on Bing and <a href=http://duckduckgo.com target=_blank rel=noopener>DuckDuckGo</a>. Obviously, &ldquo;bandcamp recommendations&rdquo; is not the only query worth ranking for, but it&rsquo;s very relevant to BCRecommender, and not too competitive (half of the first page results are old forum posts). One encouraging outcome from the work done so far is that <a href=http://www.hypebot.com/hypebot/2014/10/personalized-bandcamp-recommendations-with-bcrecommender.html target=_blank rel=noopener>my Hypebot guest post</a> does appear on the first page. Nonetheless, I&rsquo;m still interested in getting more search engine traffic. Ranking higher would probably require adding more relevant content on the site and getting more quality links (basically what SEO is all about).</p><h3 id=points-to-improve-and-next-steps>Points to improve and next steps<a hidden class=anchor aria-hidden=true href=#points-to-improve-and-next-steps>#</a></h3><p>I could definitely do better work on all of the above channels. Contrary to what&rsquo;s suggested by the Bullseye framework, I would like to put more effort into the channels that didn&rsquo;t work well. The reason is that I think they didn&rsquo;t work well because of lack of attention and weak experiments, rather than due to their unsuitability to BCRecommender.</p><p>As mentioned above, my main limiting factor is a lack of time to spend on the project. However, there&rsquo;s no pressing need to hit certain traction milestones by a specific deadline. My stretch goals are to get all Bandcamp fans to check out the project (hundreds of thousands of people), and have a significant portion of them convert by signing up to updates (tens of thousands of people). Getting there will take time. So far I&rsquo;m finding the process educational and enjoyable, which is a pleasant surprise.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/bandcamp/>Bandcamp</a></li><li><a href=https://yanirseroussi.com/tags/bcrecommender/>BCRecommender</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/music/>Music</a></li><li><a href=https://yanirseroussi.com/tags/traction-book/>Traction Book</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share BCRecommender Traction Update on x" href="https://x.com/intent/tweet/?text=BCRecommender%20Traction%20Update&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f11%2f05%2fbcrecommender-traction-update%2f&amp;hashtags=Bandcamp%2cBCRecommender%2cbusiness%2cmarketing%2cmusic%2ctractionbook"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share BCRecommender Traction Update on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f11%2f05%2fbcrecommender-traction-update%2f&amp;title=BCRecommender%20Traction%20Update&amp;summary=BCRecommender%20Traction%20Update&amp;source=https%3a%2f%2fyanirseroussi.com%2f2014%2f11%2f05%2fbcrecommender-traction-update%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share BCRecommender Traction Update on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2014%2f11%2f05%2fbcrecommender-traction-update%2f&title=BCRecommender%20Traction%20Update"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share BCRecommender Traction Update on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2014%2f11%2f05%2fbcrecommender-traction-update%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share BCRecommender Traction Update on whatsapp" href="https://api.whatsapp.com/send?text=BCRecommender%20Traction%20Update%20-%20https%3a%2f%2fyanirseroussi.com%2f2014%2f11%2f05%2fbcrecommender-traction-update%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share BCRecommender Traction Update on telegram" href="https://telegram.me/share/url?text=BCRecommender%20Traction%20Update&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f11%2f05%2fbcrecommender-traction-update%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share BCRecommender Traction Update on ycombinator" href="https://news.ycombinator.com/submitlink?t=BCRecommender%20Traction%20Update&u=https%3a%2f%2fyanirseroussi.com%2f2014%2f11%2f05%2fbcrecommender-traction-update%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="Bandcamp,BCRecommender,business,marketing,music,traction book"><meta name=description content="Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="BCRecommender Traction Update"><meta property="og:description" content="Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/"><meta property="og:image" content="https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/bullseye.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-11-05T02:29:35+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/bullseye.png"><meta name=twitter:title content="BCRecommender Traction Update"><meta name=twitter:description content="Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"BCRecommender Traction Update","item":"https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"BCRecommender Traction Update","name":"BCRecommender Traction Update","description":"Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing.","keywords":["Bandcamp","BCRecommender","business","marketing","music","traction book"],"articleBody":" This is the fifth part of a series of posts on my Bandcamp recommendations (BCRecommender) project. Check out previous posts on the general motivation behind this project, the system’s architecture, the recommendation algorithms, and initial traction planning. In a previous post, I discussed my plans to apply the Bullseye framework from the Traction Book to BCRecommender, my Bandcamp recommendations project. In that post, I reviewed the 19 traction channels described in the book, and decided to focus on the three most promising ones: blogger outreach, search engine optimisation (SEO), and content marketing. This post discusses my progress to date.\nGoals My initial traction goals were rather modest: get some feedback from real people, build up steady nonzero traffic to the site, and then increase that traffic to 10+ unique visitors per day. It’s worth noting that I have four other main areas of focus at the moment, so BCRecommender is not getting all the attention I could potentially give it. Nonetheless, I have made good progress on achieving my goals (first two have been obtained, but traffic still fluctuates), and learnt a lot in the process.\nThings that worked Blogger outreach. The most obvious people to contact are existing Bandcamp fans. It was straightforward to generate a list of prolific fans with blogs, as Bandcamp allows people to populate their profile with a short bio and links to their sites. I worked my way through part of the list, sending each fan an email introducing BCRecommender and asking for their feedback. Each email required some manual work, as the vast majority of people don’t have their email address listed on their Bandcamp profile page. I was careful not to be too spammy, which seemed to work: about 50% of the people I contacted visited BCRecommender, 20% responded with positive feedback, and 10% linked to BCRecommender in some form, with the largest volume of traffic coming from my Hypebot guest post. The problem with this approach is that it doesn’t scale, but the most valuable thing I got out of it was that people like the project and that there’s a real need for it.\nTwitter. I’m not sure where Twitter falls as a traction channel. It’s probably somewhere between (micro)blogger outreach and content marketing. However you categorise Twitter, it has been working well as a source of traffic. Simply finding people who may be interested in BCRecommender and tweeting related content has proven to be a rather low-effort way of getting attention, which is great at this stage. I have a few ideas for driving more traffic from Twitter, which I will try as I go.\nThings that didn’t work Content marketing. I haven’t really spent time doing serious content marketing apart from the Spotlights pilot. My vision for the spotlights was to generate quality articles automatically and showcase music on Bandcamp in an engaging way that helps people discover new artists, even if they don’t have a fan account. However, full automation of the spotlight feature would require a lot of work, and I think that there are lower-hanging fruits that I should focus on first. For example, finding interesting insights in the data and presenting them in an engaging way may be a better content strategy, as it would be unique to BCRecommender. For the spotlights, partnering with bloggers to write the articles may be a better approach than automation.\nSEO. I expected BCRecommender to rank higher for “bandcamp recommendations” by now, as a result of my blogger outreach efforts. At the moment, it’s still on the second page for this query on Google, though it’s the first result on Bing and DuckDuckGo. Obviously, “bandcamp recommendations” is not the only query worth ranking for, but it’s very relevant to BCRecommender, and not too competitive (half of the first page results are old forum posts). One encouraging outcome from the work done so far is that my Hypebot guest post does appear on the first page. Nonetheless, I’m still interested in getting more search engine traffic. Ranking higher would probably require adding more relevant content on the site and getting more quality links (basically what SEO is all about).\nPoints to improve and next steps I could definitely do better work on all of the above channels. Contrary to what’s suggested by the Bullseye framework, I would like to put more effort into the channels that didn’t work well. The reason is that I think they didn’t work well because of lack of attention and weak experiments, rather than due to their unsuitability to BCRecommender.\nAs mentioned above, my main limiting factor is a lack of time to spend on the project. However, there’s no pressing need to hit certain traction milestones by a specific deadline. My stretch goals are to get all Bandcamp fans to check out the project (hundreds of thousands of people), and have a significant portion of them convert by signing up to updates (tens of thousands of people). Getting there will take time. So far I’m finding the process educational and enjoyable, which is a pleasant surprise.\n","wordCount":"843","inLanguage":"en","image":"https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/bullseye.png","datePublished":"2014-11-05T02:29:35Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">BCRecommender Traction Update</h1><div class=post-meta><span title='2014-11-05 02:29:35 +0000 UTC'>November 5, 2014</span></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/bullseye.png alt></figure><div class=post-content><p class=intro-note>This is the fifth part of a series of posts on my <a href=http://www.bcrecommender.com target=_blank rel=noopener>Bandcamp recommendations (BCRecommender)</a> project. Check out previous posts on <a href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/>the general motivation behind this project</a>, <a href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/>the system’s architecture</a>, <a href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/>the recommendation algorithms</a>, and <a title="Applying the Traction Book’s Bullseye framework to BCRecommender" href=https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/>initial traction planning</a>.</p><p>In a previous post, I discussed <a href=https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/ title="Applying the Traction Book’s Bullseye framework to BCRecommender">my plans to apply the Bullseye framework from the Traction Book</a> to BCRecommender, my <a href=http://www.bcrecommender.com target=_blank rel=noopener>Bandcamp recommendations</a> project. In that post, I reviewed the 19 traction channels described in the book, and decided to focus on the three most promising ones: blogger outreach, search engine optimisation (SEO), and content marketing. This post discusses my progress to date.</p><h3 id=goals>Goals<a hidden class=anchor aria-hidden=true href=#goals>#</a></h3><p>My initial traction goals were rather modest: get some feedback from real people, build up steady nonzero traffic to the site, and then increase that traffic to 10+ unique visitors per day. It&rsquo;s worth noting that I have four other main areas of focus at the moment, so BCRecommender is not getting all the attention I could potentially give it. Nonetheless, I have made good progress on achieving my goals (first two have been obtained, but traffic still fluctuates), and learnt a lot in the process.</p><h3 id=things-that-worked>Things that worked<a hidden class=anchor aria-hidden=true href=#things-that-worked>#</a></h3><p><strong>Blogger outreach.</strong> The most obvious people to contact are existing Bandcamp fans. It was straightforward to generate a list of prolific fans with blogs, as Bandcamp allows people to populate their profile with a short bio and links to their sites. I worked my way through part of the list, sending each fan an email introducing BCRecommender and asking for their feedback. Each email required some manual work, as the vast majority of people don&rsquo;t have their email address listed on their Bandcamp profile page. I was careful not to be too spammy, which seemed to work: about 50% of the people I contacted visited BCRecommender, 20% responded with positive feedback, and 10% linked to BCRecommender in some form, with the largest volume of traffic coming from my <a href=http://www.hypebot.com/hypebot/2014/10/personalized-bandcamp-recommendations-with-bcrecommender.html target=_blank rel=noopener>Hypebot guest post</a>. The problem with this approach is that it doesn&rsquo;t scale, but the most valuable thing I got out of it was that people like the project and that there&rsquo;s a real need for it.</p><p><strong>Twitter.</strong> I&rsquo;m not sure where Twitter falls as a traction channel. It&rsquo;s probably somewhere between (micro)blogger outreach and content marketing. However you categorise Twitter, it has been working well as a source of traffic. Simply finding people who may be interested in BCRecommender and tweeting related content has proven to be a rather low-effort way of getting attention, which is great at this stage. I have a few ideas for driving more traffic from Twitter, which I will try as I go.</p><h3 id=things-that-didnt-work>Things that didn&rsquo;t work<a hidden class=anchor aria-hidden=true href=#things-that-didnt-work>#</a></h3><p><strong>Content marketing.</strong> I haven&rsquo;t really spent time doing serious content marketing apart from the <a href=http://www.bcrecommender.com/spotlights target=_blank rel=noopener>Spotlights</a> pilot. My vision for the spotlights was to generate quality articles automatically and showcase music on Bandcamp in an engaging way that helps people discover new artists, even if they don&rsquo;t have a fan account. However, full automation of the spotlight feature would require a lot of work, and I think that there are lower-hanging fruits that I should focus on first. For example, finding interesting insights in the data and presenting them in an engaging way may be a better content strategy, as it would be unique to BCRecommender. For the spotlights, partnering with bloggers to write the articles may be a better approach than automation.</p><p><strong>SEO.</strong> I expected BCRecommender to rank higher for &ldquo;bandcamp recommendations&rdquo; by now, as a result of my blogger outreach efforts. At the moment, it&rsquo;s still on the second page for this query on Google, though it&rsquo;s the first result on Bing and <a href=http://duckduckgo.com target=_blank rel=noopener>DuckDuckGo</a>. Obviously, &ldquo;bandcamp recommendations&rdquo; is not the only query worth ranking for, but it&rsquo;s very relevant to BCRecommender, and not too competitive (half of the first page results are old forum posts). One encouraging outcome from the work done so far is that <a href=http://www.hypebot.com/hypebot/2014/10/personalized-bandcamp-recommendations-with-bcrecommender.html target=_blank rel=noopener>my Hypebot guest post</a> does appear on the first page. Nonetheless, I&rsquo;m still interested in getting more search engine traffic. Ranking higher would probably require adding more relevant content on the site and getting more quality links (basically what SEO is all about).</p><h3 id=points-to-improve-and-next-steps>Points to improve and next steps<a hidden class=anchor aria-hidden=true href=#points-to-improve-and-next-steps>#</a></h3><p>I could definitely do better work on all of the above channels. Contrary to what&rsquo;s suggested by the Bullseye framework, I would like to put more effort into the channels that didn&rsquo;t work well. The reason is that I think they didn&rsquo;t work well because of lack of attention and weak experiments, rather than due to their unsuitability to BCRecommender.</p><p>As mentioned above, my main limiting factor is a lack of time to spend on the project. However, there&rsquo;s no pressing need to hit certain traction milestones by a specific deadline. My stretch goals are to get all Bandcamp fans to check out the project (hundreds of thousands of people), and have a significant portion of them convert by signing up to updates (tens of thousands of people). Getting there will take time. So far I&rsquo;m finding the process educational and enjoyable, which is a pleasant surprise.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/bandcamp/>Bandcamp</a></li><li><a href=https://yanirseroussi.com/tags/bcrecommender/>BCRecommender</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/music/>Music</a></li><li><a href=https://yanirseroussi.com/tags/traction-book/>Traction Book</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share BCRecommender Traction Update on x" href="https://x.com/intent/tweet/?text=BCRecommender%20Traction%20Update&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f11%2f05%2fbcrecommender-traction-update%2f&amp;hashtags=Bandcamp%2cBCRecommender%2cbusiness%2cmarketing%2cmusic%2ctractionbook"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share BCRecommender Traction Update on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f11%2f05%2fbcrecommender-traction-update%2f&amp;title=BCRecommender%20Traction%20Update&amp;summary=BCRecommender%20Traction%20Update&amp;source=https%3a%2f%2fyanirseroussi.com%2f2014%2f11%2f05%2fbcrecommender-traction-update%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share BCRecommender Traction Update on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2014%2f11%2f05%2fbcrecommender-traction-update%2f&title=BCRecommender%20Traction%20Update"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share BCRecommender Traction Update on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2014%2f11%2f05%2fbcrecommender-traction-update%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share BCRecommender Traction Update on whatsapp" href="https://api.whatsapp.com/send?text=BCRecommender%20Traction%20Update%20-%20https%3a%2f%2fyanirseroussi.com%2f2014%2f11%2f05%2fbcrecommender-traction-update%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share BCRecommender Traction Update on telegram" href="https://telegram.me/share/url?text=BCRecommender%20Traction%20Update&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f11%2f05%2fbcrecommender-traction-update%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share BCRecommender Traction Update on ycombinator" href="https://news.ycombinator.com/submitlink?t=BCRecommender%20Traction%20Update&u=https%3a%2f%2fyanirseroussi.com%2f2014%2f11%2f05%2fbcrecommender-traction-update%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/index.html b/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/index.html
index 7914f43bd..8bb8bb3e1 100644
--- a/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/index.html
+++ b/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary) | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data science,gradient boosting,Kaggle,Kaggle competition,predictive modelling,price forecasting,scikit-learn"><meta name=description content="Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)"><meta property="og:description" content="Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/"><meta property="og:image" content="https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/noisy-bulldozers.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-11-19T09:17:34+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/noisy-bulldozers.jpg"><meta name=twitter:title content="Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)"><meta name=twitter:description content="Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)","item":"https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)","name":"Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)","description":"Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.","keywords":["data science","gradient boosting","Kaggle","Kaggle competition","predictive modelling","price forecasting","scikit-learn"],"articleBody":"Messy data, buggy software, but all in all a good learning experience...\nEarly last year, I had some free time on my hands, so I decided to participate in yet another Kaggle competition. Having never done any price forecasting work before, I thought it would be interesting to work on the Blue Book for Bulldozers competition, where the goal was to predict the sale price of auctioned bulldozers. I’ve done alright, finishing 9th out of 476 teams. And the experience did turn out to be interesting, but not for the reasons I expected.\nData and evaluation The competition dataset consists of about 425K historical records of bulldozer sales. The training subset consists of sales from the 1990s through to the end of 2011, with the validation and testing periods being January-April 2012 and May-November 2012 respectively. The goal is to predict the sale price of each bulldozer, given the sale date and venue, and the bulldozer’s features (e.g., model ID, mechanical specifications, and machine-specific data such as machine ID and manufacturing year). Submissions were scored using the RMSLE measure.\nEarly in the competition (before I joined), there were many posts in the forum regarding issues with the data. The organisers responded by posting an appendix to the data, which included the “correct” information. From people’s posts after the competition ended, it seems like using the “correct” data consistently made the results worse. Luckily, I discovered this about a week before the competition ended. Reducing my reliance on the appendix made a huge difference in the performance of my models. This discovery was thanks to a forum post, which illustrates the general point on the importance of monitoring the forum in Kaggle competitions.\nMy approach: feature engineering, data splitting, and stochastic gradient boosting Having read the forum discussions on data quality, I assumed that spending time on data cleanup and feature engineering would give me an edge over competitors who focused only on data modelling. It’s well-known that simple models fitted on more/better data tend to yield better results than complex models fitted on less/messy data (aka GIGO – garbage in, garbage out). However, doing data cleaning and feature engineering is less glamorous than building sophisticated models, which is why many people avoid the former.\nSadly, the data was incredibly messy, so most of my cleanup efforts resulted in no improvements. Even intuitive modifications yielded poor results, like transforming each bulldozer’s manufacturing year into its age at the time of sale. Essentially, to do well in this competition, one had to fit the noise rather than remove it. This was rather disappointing, as one of the nice things about Kaggle competitions is being able to work on relatively clean data. Anomalies in data included bulldozers that have been running for hundreds of years and machines that got sold years before they were manufactured (impossible for second-hand bulldozers!). It is obvious that Fast Iron (the company who sponsored the competition) would have obtained more usable models from this competition if they had spent more time cleaning up the data themselves.\nThroughout the competition I went through several iterations of modelling and data cleaning. My final submission ended up being a linear combination of four models:\nGradient boosting machine (GBM) regression on the full dataset A linear model on the full dataset An ensemble of GBMs, one for each product group (rationale: different product groups represent different bulldozer classes, like track excavators and motor graders, so their prices are not really comparable) A similar ensemble, where each product group and sale year has a separate GBM, and earlier years get lower weight than more recent years I ended up discarding old training data (before 2000) and the machine IDs (another surprise: even though some machines were sold multiple times, this information was useless). For the GBMs, I treated categorical features as ordinal, which sort of makes sense for many of the features (e.g., model series values are ordered). For the linear model, I just coded them as binary indicators.\nThe most important discovery: stochastic gradient boosting bugs This was the first time I used gradient boosting. Since I was using so many different models, it was hard to reliably tune the number of trees, so I figured I’d use stochastic gradient boosting and rely on out-of-bag (OOB) samples to set the number of trees. This led to me finding a bug in scikit-learn: the OOB scores were actually calculated on in-bag samples.\nI reported the issue to the maintainers of scikit-learn and made an attempt at fixing it by skipping trees to obtain the OOB samples. This yielded better results than the buggy version, and in some cases I replaced a plain GBM with an ensemble of four stochastic GBMs with subsample ratio of 0.5 and a different random seed for each one (averaging their outputs).\nThis wasn’t enough to convince the maintainers of scikit-learn to accept the pull request with my fix, as they didn’t like my idea of skipping trees. This is for a good reason — obtaining better results on a single dataset should be insufficient to convince anyone. They ended up fixing the issue by copying the implementation from R’s GBM package, which is known to underestimate the number of required trees/boosting iterations (see Section 3.3 in the GBM guide).\nRecently, I had some time to test my tree skipping idea on the toy dataset used in the scikit-learn documentation. As the following figure shows, a smoothed variant of my tree skipping idea (TSO in the figure) yields superior results to the scikit-learn/R approach (SKO in the figure). The actual loss doesn’t matter — what matters is where it’s minimised. In this case TSO obtains the closest approximation of the number of iterations to the value that minimises the test error, which is a promising result.\nThese results are pretty cool, but this is still just a toy dataset (though repeating the experiment with 100 different random seeds to generate different toy datasets yields similar results). The next steps would be to repeat Ridgeway’s experiments from the GBM guide on multiple datasets to see whether the results generalise well, which will be the topic of a different post. Regardless of the final outcomes, this story illustrates the unexpected paths in which a Kaggle competition can take you. No matter what rank you end up obtaining and regardless of your skill level, there’s always something new to learn.\nUpdate: I ran Ridgway’s experiments. The results are discussed in Stochastic Gradient Boosting: Choosing the Best Number of Iterations.\n","wordCount":"1087","inLanguage":"en","image":"https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/noisy-bulldozers.jpg","datePublished":"2014-11-19T09:17:34Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)</h1><div class=post-meta><span title='2014-11-19 09:17:34 +0000 UTC'>November 19, 2014</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/noisy-bulldozers_hu766b19432f2e7b969d67fa48688a7a26_267258_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/noisy-bulldozers_hu766b19432f2e7b969d67fa48688a7a26_267258_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/noisy-bulldozers_hu766b19432f2e7b969d67fa48688a7a26_267258_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/noisy-bulldozers.jpg 800w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/noisy-bulldozers.jpg alt width=800 height=261></figure><div class=post-content><p class=intro-note>Messy data, buggy software, but all in all a good learning experience...</p><p>Early last year, I had some free time on my hands, so I decided to participate in yet another Kaggle competition. Having never done any price forecasting work before, I thought it would be interesting to work on the <a href=https://www.kaggle.com/c/bluebook-for-bulldozers target=_blank rel=noopener>Blue Book for Bulldozers competition</a>, where the goal was to predict the sale price of auctioned bulldozers. I&rsquo;ve done alright, finishing 9th out of 476 teams. And the experience did turn out to be interesting, but not for the reasons I expected.</p><h3 id=data-and-evaluation>Data and evaluation<a hidden class=anchor aria-hidden=true href=#data-and-evaluation>#</a></h3><p>The competition dataset consists of about 425K historical records of bulldozer sales. The training subset consists of sales from the 1990s through to the end of 2011, with the validation and testing periods being January-April 2012 and May-November 2012 respectively. The goal is to predict the sale price of each bulldozer, given the sale date and venue, and the bulldozer&rsquo;s features (e.g., model ID, mechanical specifications, and machine-specific data such as machine ID and manufacturing year). Submissions were scored using the <a href=http://www.kaggle.com/wiki/RootMeanSquaredLogarithmicError target=_blank rel=noopener>RMSLE measure</a>.</p><p>Early in the competition (before I joined), there were many posts in the forum regarding issues with the data. The organisers responded by posting an appendix to the data, which included the &ldquo;correct&rdquo; information. From people&rsquo;s posts after the competition ended, it seems like using the &ldquo;correct&rdquo; data consistently made the results <strong>worse</strong>. Luckily, I discovered this about a week before the competition ended. Reducing my reliance on the appendix made a huge difference in the performance of my models. This discovery was thanks to a forum post, which illustrates the <a href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/ title="How to (almost) win Kaggle competitions - Tip 9">general point on the importance of monitoring the forum in Kaggle competitions</a>.</p><h3 id=my-approach-feature-engineering-data-splitting-and-stochastic-gradient-boosting>My approach: feature engineering, data splitting, and stochastic gradient boosting<a hidden class=anchor aria-hidden=true href=#my-approach-feature-engineering-data-splitting-and-stochastic-gradient-boosting>#</a></h3><p>Having read the forum discussions on data quality, I assumed that spending time on data cleanup and feature engineering would give me an edge over competitors who focused only on data modelling. It&rsquo;s well-known that simple models fitted on more/better data tend to yield better results than complex models fitted on less/messy data (aka GIGO – garbage in, garbage out). However, doing data cleaning and feature engineering is less glamorous than building sophisticated models, which is why many people avoid the former.</p><p>Sadly, the data was incredibly messy, so most of my cleanup efforts resulted in no improvements. Even intuitive modifications yielded poor results, like transforming each bulldozer&rsquo;s manufacturing year into its age at the time of sale. Essentially, to do well in this competition, one had to fit the noise rather than remove it. This was rather disappointing, as one of the nice things about Kaggle competitions is being able to work on relatively clean data. Anomalies in data included bulldozers that have been running for hundreds of years and machines that got sold years before they were manufactured (impossible for second-hand bulldozers!). It is obvious that Fast Iron (the company who sponsored the competition) would have obtained more usable models from this competition if they had spent more time cleaning up the data themselves.</p><p>Throughout the competition I went through several iterations of modelling and data cleaning. My final submission ended up being a linear combination of four models:</p><ul><li><a href=http://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting target=_blank rel=noopener>Gradient boosting machine</a> (GBM) regression on the full dataset</li><li>A linear model on the full dataset</li><li>An ensemble of GBMs, one for each product group (rationale: different product groups represent different bulldozer classes, like track excavators and motor graders, so their prices are not really comparable)</li><li>A similar ensemble, where each product group and sale year has a separate GBM, and earlier years get lower weight than more recent years</li></ul><p>I ended up discarding old training data (before 2000) and the machine IDs (another surprise: even though some machines were sold multiple times, this information was useless). For the GBMs, I treated categorical features as ordinal, which sort of makes sense for many of the features (e.g., model series values are ordered). For the linear model, I just coded them as binary indicators.</p><h3 id=the-most-important-discovery-stochastic-gradient-boosting-bugs>The most important discovery: stochastic gradient boosting bugs<a hidden class=anchor aria-hidden=true href=#the-most-important-discovery-stochastic-gradient-boosting-bugs>#</a></h3><p>This was the first time I used gradient boosting. Since I was using so many different models, it was hard to reliably tune the number of trees, so I figured I&rsquo;d use stochastic gradient boosting and rely on out-of-bag (OOB) samples to set the number of trees. This led to me finding a bug in <a href=http://scikit-learn.org target=_blank rel=noopener>scikit-learn</a>: the OOB scores were actually calculated on in-bag samples.</p><p>I <a href=https://github.com/scikit-learn/scikit-learn/issues/1802 target=_blank rel=noopener>reported the issue</a> to the maintainers of scikit-learn and made an attempt at fixing it by skipping trees to obtain the OOB samples. This yielded better results than the buggy version, and in some cases I replaced a plain GBM with an ensemble of four stochastic GBMs with subsample ratio of 0.5 and a different random seed for each one (averaging their outputs).</p><p>This wasn&rsquo;t enough to convince the maintainers of scikit-learn to accept <a href=https://github.com/scikit-learn/scikit-learn/pull/1806 target=_blank rel=noopener>the pull request with my fix</a>, as they didn&rsquo;t like my idea of skipping trees. This is for a good reason — obtaining better results on a single dataset should be insufficient to convince anyone. They ended up fixing the issue by copying the implementation from R&rsquo;s GBM package, which is known to underestimate the number of required trees/boosting iterations (see <a href=http://cran.open-source-solution.org/web/packages/gbm/vignettes/gbm.pdf target=_blank rel=noopener>Section 3.3 in the GBM guide</a>).</p><p>Recently, I had some time to test my tree skipping idea on the toy dataset used in <a href=http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_oob.html target=_blank rel=noopener>the scikit-learn documentation</a>. As the following figure shows, a smoothed variant of my tree skipping idea (TSO in the figure) yields superior results to the scikit-learn/R approach (SKO in the figure). The actual loss doesn&rsquo;t matter — what matters is where it&rsquo;s minimised. In this case TSO obtains the closest approximation of the number of iterations to the value that minimises the test error, which is a promising result.</p><figure><a href=gradient-boosting-out-of-bag-experiment-toy-dataset.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="data science,gradient boosting,Kaggle,Kaggle competition,predictive modelling,price forecasting,scikit-learn"><meta name=description content="Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)"><meta property="og:description" content="Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/"><meta property="og:image" content="https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/noisy-bulldozers.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-11-19T09:17:34+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/noisy-bulldozers.jpg"><meta name=twitter:title content="Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)"><meta name=twitter:description content="Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)","item":"https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)","name":"Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)","description":"Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.","keywords":["data science","gradient boosting","Kaggle","Kaggle competition","predictive modelling","price forecasting","scikit-learn"],"articleBody":"Messy data, buggy software, but all in all a good learning experience...\nEarly last year, I had some free time on my hands, so I decided to participate in yet another Kaggle competition. Having never done any price forecasting work before, I thought it would be interesting to work on the Blue Book for Bulldozers competition, where the goal was to predict the sale price of auctioned bulldozers. I’ve done alright, finishing 9th out of 476 teams. And the experience did turn out to be interesting, but not for the reasons I expected.\nData and evaluation The competition dataset consists of about 425K historical records of bulldozer sales. The training subset consists of sales from the 1990s through to the end of 2011, with the validation and testing periods being January-April 2012 and May-November 2012 respectively. The goal is to predict the sale price of each bulldozer, given the sale date and venue, and the bulldozer’s features (e.g., model ID, mechanical specifications, and machine-specific data such as machine ID and manufacturing year). Submissions were scored using the RMSLE measure.\nEarly in the competition (before I joined), there were many posts in the forum regarding issues with the data. The organisers responded by posting an appendix to the data, which included the “correct” information. From people’s posts after the competition ended, it seems like using the “correct” data consistently made the results worse. Luckily, I discovered this about a week before the competition ended. Reducing my reliance on the appendix made a huge difference in the performance of my models. This discovery was thanks to a forum post, which illustrates the general point on the importance of monitoring the forum in Kaggle competitions.\nMy approach: feature engineering, data splitting, and stochastic gradient boosting Having read the forum discussions on data quality, I assumed that spending time on data cleanup and feature engineering would give me an edge over competitors who focused only on data modelling. It’s well-known that simple models fitted on more/better data tend to yield better results than complex models fitted on less/messy data (aka GIGO – garbage in, garbage out). However, doing data cleaning and feature engineering is less glamorous than building sophisticated models, which is why many people avoid the former.\nSadly, the data was incredibly messy, so most of my cleanup efforts resulted in no improvements. Even intuitive modifications yielded poor results, like transforming each bulldozer’s manufacturing year into its age at the time of sale. Essentially, to do well in this competition, one had to fit the noise rather than remove it. This was rather disappointing, as one of the nice things about Kaggle competitions is being able to work on relatively clean data. Anomalies in data included bulldozers that have been running for hundreds of years and machines that got sold years before they were manufactured (impossible for second-hand bulldozers!). It is obvious that Fast Iron (the company who sponsored the competition) would have obtained more usable models from this competition if they had spent more time cleaning up the data themselves.\nThroughout the competition I went through several iterations of modelling and data cleaning. My final submission ended up being a linear combination of four models:\nGradient boosting machine (GBM) regression on the full dataset A linear model on the full dataset An ensemble of GBMs, one for each product group (rationale: different product groups represent different bulldozer classes, like track excavators and motor graders, so their prices are not really comparable) A similar ensemble, where each product group and sale year has a separate GBM, and earlier years get lower weight than more recent years I ended up discarding old training data (before 2000) and the machine IDs (another surprise: even though some machines were sold multiple times, this information was useless). For the GBMs, I treated categorical features as ordinal, which sort of makes sense for many of the features (e.g., model series values are ordered). For the linear model, I just coded them as binary indicators.\nThe most important discovery: stochastic gradient boosting bugs This was the first time I used gradient boosting. Since I was using so many different models, it was hard to reliably tune the number of trees, so I figured I’d use stochastic gradient boosting and rely on out-of-bag (OOB) samples to set the number of trees. This led to me finding a bug in scikit-learn: the OOB scores were actually calculated on in-bag samples.\nI reported the issue to the maintainers of scikit-learn and made an attempt at fixing it by skipping trees to obtain the OOB samples. This yielded better results than the buggy version, and in some cases I replaced a plain GBM with an ensemble of four stochastic GBMs with subsample ratio of 0.5 and a different random seed for each one (averaging their outputs).\nThis wasn’t enough to convince the maintainers of scikit-learn to accept the pull request with my fix, as they didn’t like my idea of skipping trees. This is for a good reason — obtaining better results on a single dataset should be insufficient to convince anyone. They ended up fixing the issue by copying the implementation from R’s GBM package, which is known to underestimate the number of required trees/boosting iterations (see Section 3.3 in the GBM guide).\nRecently, I had some time to test my tree skipping idea on the toy dataset used in the scikit-learn documentation. As the following figure shows, a smoothed variant of my tree skipping idea (TSO in the figure) yields superior results to the scikit-learn/R approach (SKO in the figure). The actual loss doesn’t matter — what matters is where it’s minimised. In this case TSO obtains the closest approximation of the number of iterations to the value that minimises the test error, which is a promising result.\nThese results are pretty cool, but this is still just a toy dataset (though repeating the experiment with 100 different random seeds to generate different toy datasets yields similar results). The next steps would be to repeat Ridgeway’s experiments from the GBM guide on multiple datasets to see whether the results generalise well, which will be the topic of a different post. Regardless of the final outcomes, this story illustrates the unexpected paths in which a Kaggle competition can take you. No matter what rank you end up obtaining and regardless of your skill level, there’s always something new to learn.\nUpdate: I ran Ridgway’s experiments. The results are discussed in Stochastic Gradient Boosting: Choosing the Best Number of Iterations.\n","wordCount":"1087","inLanguage":"en","image":"https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/noisy-bulldozers.jpg","datePublished":"2014-11-19T09:17:34Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)</h1><div class=post-meta><span title='2014-11-19 09:17:34 +0000 UTC'>November 19, 2014</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/noisy-bulldozers_hu766b19432f2e7b969d67fa48688a7a26_267258_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/noisy-bulldozers_hu766b19432f2e7b969d67fa48688a7a26_267258_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/noisy-bulldozers_hu766b19432f2e7b969d67fa48688a7a26_267258_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/noisy-bulldozers.jpg 800w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/noisy-bulldozers.jpg alt width=800 height=261></figure><div class=post-content><p class=intro-note>Messy data, buggy software, but all in all a good learning experience...</p><p>Early last year, I had some free time on my hands, so I decided to participate in yet another Kaggle competition. Having never done any price forecasting work before, I thought it would be interesting to work on the <a href=https://www.kaggle.com/c/bluebook-for-bulldozers target=_blank rel=noopener>Blue Book for Bulldozers competition</a>, where the goal was to predict the sale price of auctioned bulldozers. I&rsquo;ve done alright, finishing 9th out of 476 teams. And the experience did turn out to be interesting, but not for the reasons I expected.</p><h3 id=data-and-evaluation>Data and evaluation<a hidden class=anchor aria-hidden=true href=#data-and-evaluation>#</a></h3><p>The competition dataset consists of about 425K historical records of bulldozer sales. The training subset consists of sales from the 1990s through to the end of 2011, with the validation and testing periods being January-April 2012 and May-November 2012 respectively. The goal is to predict the sale price of each bulldozer, given the sale date and venue, and the bulldozer&rsquo;s features (e.g., model ID, mechanical specifications, and machine-specific data such as machine ID and manufacturing year). Submissions were scored using the <a href=http://www.kaggle.com/wiki/RootMeanSquaredLogarithmicError target=_blank rel=noopener>RMSLE measure</a>.</p><p>Early in the competition (before I joined), there were many posts in the forum regarding issues with the data. The organisers responded by posting an appendix to the data, which included the &ldquo;correct&rdquo; information. From people&rsquo;s posts after the competition ended, it seems like using the &ldquo;correct&rdquo; data consistently made the results <strong>worse</strong>. Luckily, I discovered this about a week before the competition ended. Reducing my reliance on the appendix made a huge difference in the performance of my models. This discovery was thanks to a forum post, which illustrates the <a href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/ title="How to (almost) win Kaggle competitions - Tip 9">general point on the importance of monitoring the forum in Kaggle competitions</a>.</p><h3 id=my-approach-feature-engineering-data-splitting-and-stochastic-gradient-boosting>My approach: feature engineering, data splitting, and stochastic gradient boosting<a hidden class=anchor aria-hidden=true href=#my-approach-feature-engineering-data-splitting-and-stochastic-gradient-boosting>#</a></h3><p>Having read the forum discussions on data quality, I assumed that spending time on data cleanup and feature engineering would give me an edge over competitors who focused only on data modelling. It&rsquo;s well-known that simple models fitted on more/better data tend to yield better results than complex models fitted on less/messy data (aka GIGO – garbage in, garbage out). However, doing data cleaning and feature engineering is less glamorous than building sophisticated models, which is why many people avoid the former.</p><p>Sadly, the data was incredibly messy, so most of my cleanup efforts resulted in no improvements. Even intuitive modifications yielded poor results, like transforming each bulldozer&rsquo;s manufacturing year into its age at the time of sale. Essentially, to do well in this competition, one had to fit the noise rather than remove it. This was rather disappointing, as one of the nice things about Kaggle competitions is being able to work on relatively clean data. Anomalies in data included bulldozers that have been running for hundreds of years and machines that got sold years before they were manufactured (impossible for second-hand bulldozers!). It is obvious that Fast Iron (the company who sponsored the competition) would have obtained more usable models from this competition if they had spent more time cleaning up the data themselves.</p><p>Throughout the competition I went through several iterations of modelling and data cleaning. My final submission ended up being a linear combination of four models:</p><ul><li><a href=http://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting target=_blank rel=noopener>Gradient boosting machine</a> (GBM) regression on the full dataset</li><li>A linear model on the full dataset</li><li>An ensemble of GBMs, one for each product group (rationale: different product groups represent different bulldozer classes, like track excavators and motor graders, so their prices are not really comparable)</li><li>A similar ensemble, where each product group and sale year has a separate GBM, and earlier years get lower weight than more recent years</li></ul><p>I ended up discarding old training data (before 2000) and the machine IDs (another surprise: even though some machines were sold multiple times, this information was useless). For the GBMs, I treated categorical features as ordinal, which sort of makes sense for many of the features (e.g., model series values are ordered). For the linear model, I just coded them as binary indicators.</p><h3 id=the-most-important-discovery-stochastic-gradient-boosting-bugs>The most important discovery: stochastic gradient boosting bugs<a hidden class=anchor aria-hidden=true href=#the-most-important-discovery-stochastic-gradient-boosting-bugs>#</a></h3><p>This was the first time I used gradient boosting. Since I was using so many different models, it was hard to reliably tune the number of trees, so I figured I&rsquo;d use stochastic gradient boosting and rely on out-of-bag (OOB) samples to set the number of trees. This led to me finding a bug in <a href=http://scikit-learn.org target=_blank rel=noopener>scikit-learn</a>: the OOB scores were actually calculated on in-bag samples.</p><p>I <a href=https://github.com/scikit-learn/scikit-learn/issues/1802 target=_blank rel=noopener>reported the issue</a> to the maintainers of scikit-learn and made an attempt at fixing it by skipping trees to obtain the OOB samples. This yielded better results than the buggy version, and in some cases I replaced a plain GBM with an ensemble of four stochastic GBMs with subsample ratio of 0.5 and a different random seed for each one (averaging their outputs).</p><p>This wasn&rsquo;t enough to convince the maintainers of scikit-learn to accept <a href=https://github.com/scikit-learn/scikit-learn/pull/1806 target=_blank rel=noopener>the pull request with my fix</a>, as they didn&rsquo;t like my idea of skipping trees. This is for a good reason — obtaining better results on a single dataset should be insufficient to convince anyone. They ended up fixing the issue by copying the implementation from R&rsquo;s GBM package, which is known to underestimate the number of required trees/boosting iterations (see <a href=http://cran.open-source-solution.org/web/packages/gbm/vignettes/gbm.pdf target=_blank rel=noopener>Section 3.3 in the GBM guide</a>).</p><p>Recently, I had some time to test my tree skipping idea on the toy dataset used in <a href=http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_oob.html target=_blank rel=noopener>the scikit-learn documentation</a>. As the following figure shows, a smoothed variant of my tree skipping idea (TSO in the figure) yields superior results to the scikit-learn/R approach (SKO in the figure). The actual loss doesn&rsquo;t matter — what matters is where it&rsquo;s minimised. In this case TSO obtains the closest approximation of the number of iterations to the value that minimises the test error, which is a promising result.</p><figure><a href=gradient-boosting-out-of-bag-experiment-toy-dataset.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/gradient-boosting-out-of-bag-experiment-toy-dataset_hu02dc1ebe47af12a7ec8f5877429b5dec_71277_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/gradient-boosting-out-of-bag-experiment-toy-dataset_hu02dc1ebe47af12a7ec8f5877429b5dec_71277_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/gradient-boosting-out-of-bag-experiment-toy-dataset_hu02dc1ebe47af12a7ec8f5877429b5dec_71277_720x0_resize_box_3.png 720w,
diff --git a/2014/12/15/seo-mostly-about-showing-up/index.html b/2014/12/15/seo-mostly-about-showing-up/index.html
index a3e0b8777..0ec2018df 100644
--- a/2014/12/15/seo-mostly-about-showing-up/index.html
+++ b/2014/12/15/seo-mostly-about-showing-up/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>SEO: Mostly about showing up? | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="BCRecommender,marketing,search engine optimisation,traction book"><meta name=description content="Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="SEO: Mostly about showing up?"><meta property="og:description" content="Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/"><meta property="og:image" content="https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/bcrecommender-search-queries.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-12-15T04:25:25+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/bcrecommender-search-queries.png"><meta name=twitter:title content="SEO: Mostly about showing up?"><meta name=twitter:description content="Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"SEO: Mostly about showing up?","item":"https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"SEO: Mostly about showing up?","name":"SEO: Mostly about showing up?","description":"Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content.","keywords":["BCRecommender","marketing","search engine optimisation","traction book"],"articleBody":"In previous posts about getting traction for my Bandcamp recommendations project (BCRecommender), I mentioned search engine optimisation (SEO) as one of the promising traction channels. Unfortunately, early efforts yielded negligible traffic – most new visitors came from referrals from blogs and Twitter. It turns out that the problem was not showing up for the SEO game: most of BCRecommender’s pages were blocked for crawling via robots.txt because I was worried that search engines (=Google) would penalise the website for thin/duplicate content.\nRecently, I beefed up most of the pages, created a sitemap, and removed most pages from robots.txt. This resulted in a significant increase in traffic, as illustrated by the above graph. The number of organic impressions went up from less than ten per day to over a thousand. This is expected to go up even further, as only about 10% of pages are indexed. In addition, some traffic went to my staging site because it wasn’t blocked from crawling (I had to set up a new staging site that is password-protected and add a redirect from the old site to the production site – a bit annoying but I couldn’t find a better solution).\nI hope Google won’t suddenly decide that BCRecommender content is not valuable or too thin. The content is automatically generated, which is “bad”, but it doesn’t “consist of paragraphs of random text that make no sense to the reader but which may contain search keywords”. As a (completely unbiased) user, I think it is valuable to find similar albums when searching for an album you like – an example that represents the majority of people that click through to BCRecommender. Judging from the main engagement measure I’m using (time spent on site), a good number of these people are happy with what they find.\nMore updates to come in the future. For now, my conclusion is: thin content is better than no content, as long as it’s relevant to what people are searching for and provides real value.\n","wordCount":"333","inLanguage":"en","image":"https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/bcrecommender-search-queries.png","datePublished":"2014-12-15T04:25:25Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">SEO: Mostly about showing up?</h1><div class=post-meta><span title='2014-12-15 04:25:25 +0000 UTC'>December 15, 2014</span></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/bcrecommender-search-queries.png alt></figure><div class=post-content><p>In previous posts about getting traction for my <a href=http://www.bcrecommender.com target=_blank rel=noopener>Bandcamp recommendations project (BCRecommender)</a>, I mentioned search engine optimisation (SEO) as one of the promising traction channels. Unfortunately, early efforts yielded negligible traffic – <a href=https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/ title="BCRecommender Traction Update">most new visitors came from referrals from blogs and Twitter</a>. It turns out that the problem was <strong>not showing up for the SEO game</strong>: most of BCRecommender&rsquo;s pages were blocked for crawling via robots.txt because I was worried that search engines (=Google) would penalise the website for <a href="https://support.google.com/webmasters/answer/2604719?hl=en" target=_blank rel=noopener>thin/duplicate content</a>.</p><p>Recently, I beefed up most of the pages, created a sitemap, and removed most pages from robots.txt. This resulted in a significant increase in traffic, as illustrated by the above graph. The number of organic impressions went up from less than ten per day to over a thousand. This is expected to go up even further, as only about 10% of pages are indexed. In addition, some traffic went to my staging site because it wasn&rsquo;t blocked from crawling (I had to set up a new staging site that is password-protected and add a redirect from the old site to the production site – a bit annoying but I couldn&rsquo;t find a better solution).</p><p>I hope Google won&rsquo;t suddenly decide that BCRecommender content is not valuable or too thin. The content is automatically generated, which is <a href=https://support.google.com/webmasters/answer/2721306 target=_blank rel=noopener>&ldquo;bad&rdquo;</a>, but it doesn&rsquo;t &ldquo;consist of paragraphs of random text that make no sense to the reader but which may contain search keywords&rdquo;. As a (completely unbiased) user, I think it is valuable to find similar albums when searching for an album you like – an example that represents the majority of people that click through to BCRecommender. Judging from the main engagement measure I&rsquo;m using (time spent on site), a good number of these people are happy with what they find.</p><p>More updates to come in the future. For now, my conclusion is: thin content is better than no content, as long as it&rsquo;s relevant to what people are searching for and provides real value.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/bcrecommender/>BCRecommender</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/search-engine-optimisation/>Search Engine Optimisation</a></li><li><a href=https://yanirseroussi.com/tags/traction-book/>Traction Book</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share SEO: Mostly about showing up? on x" href="https://x.com/intent/tweet/?text=SEO%3a%20Mostly%20about%20showing%20up%3f&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f12%2f15%2fseo-mostly-about-showing-up%2f&amp;hashtags=BCRecommender%2cmarketing%2csearchengineoptimisation%2ctractionbook"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share SEO: Mostly about showing up? on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f12%2f15%2fseo-mostly-about-showing-up%2f&amp;title=SEO%3a%20Mostly%20about%20showing%20up%3f&amp;summary=SEO%3a%20Mostly%20about%20showing%20up%3f&amp;source=https%3a%2f%2fyanirseroussi.com%2f2014%2f12%2f15%2fseo-mostly-about-showing-up%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share SEO: Mostly about showing up? on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2014%2f12%2f15%2fseo-mostly-about-showing-up%2f&title=SEO%3a%20Mostly%20about%20showing%20up%3f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share SEO: Mostly about showing up? on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2014%2f12%2f15%2fseo-mostly-about-showing-up%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share SEO: Mostly about showing up? on whatsapp" href="https://api.whatsapp.com/send?text=SEO%3a%20Mostly%20about%20showing%20up%3f%20-%20https%3a%2f%2fyanirseroussi.com%2f2014%2f12%2f15%2fseo-mostly-about-showing-up%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share SEO: Mostly about showing up? on telegram" href="https://telegram.me/share/url?text=SEO%3a%20Mostly%20about%20showing%20up%3f&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f12%2f15%2fseo-mostly-about-showing-up%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share SEO: Mostly about showing up? on ycombinator" href="https://news.ycombinator.com/submitlink?t=SEO%3a%20Mostly%20about%20showing%20up%3f&u=https%3a%2f%2fyanirseroussi.com%2f2014%2f12%2f15%2fseo-mostly-about-showing-up%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="BCRecommender,marketing,search engine optimisation,traction book"><meta name=description content="Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="SEO: Mostly about showing up?"><meta property="og:description" content="Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/"><meta property="og:image" content="https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/bcrecommender-search-queries.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-12-15T04:25:25+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/bcrecommender-search-queries.png"><meta name=twitter:title content="SEO: Mostly about showing up?"><meta name=twitter:description content="Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"SEO: Mostly about showing up?","item":"https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"SEO: Mostly about showing up?","name":"SEO: Mostly about showing up?","description":"Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content.","keywords":["BCRecommender","marketing","search engine optimisation","traction book"],"articleBody":"In previous posts about getting traction for my Bandcamp recommendations project (BCRecommender), I mentioned search engine optimisation (SEO) as one of the promising traction channels. Unfortunately, early efforts yielded negligible traffic – most new visitors came from referrals from blogs and Twitter. It turns out that the problem was not showing up for the SEO game: most of BCRecommender’s pages were blocked for crawling via robots.txt because I was worried that search engines (=Google) would penalise the website for thin/duplicate content.\nRecently, I beefed up most of the pages, created a sitemap, and removed most pages from robots.txt. This resulted in a significant increase in traffic, as illustrated by the above graph. The number of organic impressions went up from less than ten per day to over a thousand. This is expected to go up even further, as only about 10% of pages are indexed. In addition, some traffic went to my staging site because it wasn’t blocked from crawling (I had to set up a new staging site that is password-protected and add a redirect from the old site to the production site – a bit annoying but I couldn’t find a better solution).\nI hope Google won’t suddenly decide that BCRecommender content is not valuable or too thin. The content is automatically generated, which is “bad”, but it doesn’t “consist of paragraphs of random text that make no sense to the reader but which may contain search keywords”. As a (completely unbiased) user, I think it is valuable to find similar albums when searching for an album you like – an example that represents the majority of people that click through to BCRecommender. Judging from the main engagement measure I’m using (time spent on site), a good number of these people are happy with what they find.\nMore updates to come in the future. For now, my conclusion is: thin content is better than no content, as long as it’s relevant to what people are searching for and provides real value.\n","wordCount":"333","inLanguage":"en","image":"https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/bcrecommender-search-queries.png","datePublished":"2014-12-15T04:25:25Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">SEO: Mostly about showing up?</h1><div class=post-meta><span title='2014-12-15 04:25:25 +0000 UTC'>December 15, 2014</span></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/bcrecommender-search-queries.png alt></figure><div class=post-content><p>In previous posts about getting traction for my <a href=http://www.bcrecommender.com target=_blank rel=noopener>Bandcamp recommendations project (BCRecommender)</a>, I mentioned search engine optimisation (SEO) as one of the promising traction channels. Unfortunately, early efforts yielded negligible traffic – <a href=https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/ title="BCRecommender Traction Update">most new visitors came from referrals from blogs and Twitter</a>. It turns out that the problem was <strong>not showing up for the SEO game</strong>: most of BCRecommender&rsquo;s pages were blocked for crawling via robots.txt because I was worried that search engines (=Google) would penalise the website for <a href="https://support.google.com/webmasters/answer/2604719?hl=en" target=_blank rel=noopener>thin/duplicate content</a>.</p><p>Recently, I beefed up most of the pages, created a sitemap, and removed most pages from robots.txt. This resulted in a significant increase in traffic, as illustrated by the above graph. The number of organic impressions went up from less than ten per day to over a thousand. This is expected to go up even further, as only about 10% of pages are indexed. In addition, some traffic went to my staging site because it wasn&rsquo;t blocked from crawling (I had to set up a new staging site that is password-protected and add a redirect from the old site to the production site – a bit annoying but I couldn&rsquo;t find a better solution).</p><p>I hope Google won&rsquo;t suddenly decide that BCRecommender content is not valuable or too thin. The content is automatically generated, which is <a href=https://support.google.com/webmasters/answer/2721306 target=_blank rel=noopener>&ldquo;bad&rdquo;</a>, but it doesn&rsquo;t &ldquo;consist of paragraphs of random text that make no sense to the reader but which may contain search keywords&rdquo;. As a (completely unbiased) user, I think it is valuable to find similar albums when searching for an album you like – an example that represents the majority of people that click through to BCRecommender. Judging from the main engagement measure I&rsquo;m using (time spent on site), a good number of these people are happy with what they find.</p><p>More updates to come in the future. For now, my conclusion is: thin content is better than no content, as long as it&rsquo;s relevant to what people are searching for and provides real value.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/bcrecommender/>BCRecommender</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/search-engine-optimisation/>Search Engine Optimisation</a></li><li><a href=https://yanirseroussi.com/tags/traction-book/>Traction Book</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share SEO: Mostly about showing up? on x" href="https://x.com/intent/tweet/?text=SEO%3a%20Mostly%20about%20showing%20up%3f&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f12%2f15%2fseo-mostly-about-showing-up%2f&amp;hashtags=BCRecommender%2cmarketing%2csearchengineoptimisation%2ctractionbook"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share SEO: Mostly about showing up? on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f12%2f15%2fseo-mostly-about-showing-up%2f&amp;title=SEO%3a%20Mostly%20about%20showing%20up%3f&amp;summary=SEO%3a%20Mostly%20about%20showing%20up%3f&amp;source=https%3a%2f%2fyanirseroussi.com%2f2014%2f12%2f15%2fseo-mostly-about-showing-up%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share SEO: Mostly about showing up? on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2014%2f12%2f15%2fseo-mostly-about-showing-up%2f&title=SEO%3a%20Mostly%20about%20showing%20up%3f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share SEO: Mostly about showing up? on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2014%2f12%2f15%2fseo-mostly-about-showing-up%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share SEO: Mostly about showing up? on whatsapp" href="https://api.whatsapp.com/send?text=SEO%3a%20Mostly%20about%20showing%20up%3f%20-%20https%3a%2f%2fyanirseroussi.com%2f2014%2f12%2f15%2fseo-mostly-about-showing-up%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share SEO: Mostly about showing up? on telegram" href="https://telegram.me/share/url?text=SEO%3a%20Mostly%20about%20showing%20up%3f&amp;url=https%3a%2f%2fyanirseroussi.com%2f2014%2f12%2f15%2fseo-mostly-about-showing-up%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share SEO: Mostly about showing up? on ycombinator" href="https://news.ycombinator.com/submitlink?t=SEO%3a%20Mostly%20about%20showing%20up%3f&u=https%3a%2f%2fyanirseroussi.com%2f2014%2f12%2f15%2fseo-mostly-about-showing-up%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/index.html b/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/index.html
index d273a35f3..97ddde93c 100644
--- a/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/index.html
+++ b/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Stochastic Gradient Boosting: Choosing the Best Number of Iterations | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data science,gradient boosting,machine learning,predictive modelling,scikit-learn"><meta name=description content="Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Stochastic Gradient Boosting: Choosing the Best Number of Iterations"><meta property="og:description" content="Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-12-29T02:30:06+00:00"><meta property="article:modified_time" content="2023-07-06T09:28:02+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Stochastic Gradient Boosting: Choosing the Best Number of Iterations"><meta name=twitter:description content="Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Stochastic Gradient Boosting: Choosing the Best Number of Iterations","item":"https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Stochastic Gradient Boosting: Choosing the Best Number of Iterations","name":"Stochastic Gradient Boosting: Choosing the Best Number of Iterations","description":"Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.","keywords":["data science","gradient boosting","machine learning","predictive modelling","scikit-learn"],"articleBody":"In my summary of the Kaggle bulldozer price forecasting competition, I mentioned that part of my solution was based on stochastic gradient boosting. To reduce runtime, the number of boosting iterations was set by minimising the loss on the out-of-bag (OOB) samples, skipping trees where samples are in-bag. This approach was motivated by a bug in scikit-learn, where the OOB loss estimate was calculated on the in-bag samples, meaning that it always improved (and thus was useless for the purpose of setting the number of iterations).\nThe bug in scikit-learn was fixed by porting the solution used in R’s GBM package, where the number of iterations is estimated by minimising the improvement on the OOB samples in each boosting iteration. This approach is known to underestimate the number of required iterations, which means that it’s not very useful in practice. This underestimation may be due to the fact that the GBM method is partly estimated on in-bag samples, as the OOB samples for the Nth iteration are likely to have been in-bag in previous iterations.\nI was curious about how my approach compares to the GBM method. Preliminary results on the toy dataset from scikit-learn’s documentation looked promising:\nMy approach (TSO) beat both 5-fold cross-validation (CV) and the GBM/scikit-learn method (SKO), as TSO obtains its minimum at the closest number of iterations to the test set’s (T) optimal value.\nThe next step in testing TSO’s viability was to rerun Ridgeway’s experiments from Section 3.3 of the GBM documentation (R code here). I used the same 12 UCI datasets that Ridgeway used, running 5×2 cross-validation on each one. For each dataset, the score was obtained by dividing the mean loss of the best method on the dataset by the loss of each method. Hence, all scores are between 0.0 and 1.0, with the best score being 1.0. The following figure summarises the results on the 12 datasets.\nThe following table shows the raw data that was used to produce the figure.\nDataset CV SKO TSO creditrating 0.9962 0.9771 1 breastcancer 1 0.6675 0.4869 mushrooms 0.9588 0.9963 1 abalone 1 0.9754 0.9963 ionosphere 0.9919 1 0.8129 diabetes 1 0.9869 0.9985 autoprices 1 0.9565 0.5839 autompg 1 0.8753 0.9948 bostonhousing 1 0.8299 0.5412 haberman 1 0.9793 0.9266 cpuperformance 0.9934 0.9160 1 adult 1 0.9824 0.9991 The main finding is that CV remains the most reliable approach. Even when CV is not the best-performing method, it’s not much worse than the best method (this is in line with Ridgeway’s findings). TSO yielded the best results on 3/12 of the datasets, and beat SKO 7/12 times. However, TSO’s results are the most variant of the three methods: when it fails, it often yields very poor results.\nIn conclusion, stick to cross-validation for the best results. It’s more computationally intensive than SKO and TSO, but can be parallelised. I still think that there may be a way to avoid cross-validation, perhaps by extending SKO/TSO in more intelligent ways (see some interesting ideas by Eugene Dubossarsky here and here). Any comments/ideas are very welcome.\n","wordCount":"507","inLanguage":"en","datePublished":"2014-12-29T02:30:06Z","dateModified":"2023-07-06T09:28:02+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Stochastic Gradient Boosting: Choosing the Best Number of Iterations</h1><div class=post-meta><span title='2014-12-29 02:30:06 +0000 UTC'>December 29, 2014</span></div></header><div class=post-content><p>In my <a href=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/ title="Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)">summary of the Kaggle bulldozer price forecasting competition</a>, I mentioned that part of my solution was based on stochastic gradient boosting. To reduce runtime, the number of boosting iterations was set by minimising the loss on the out-of-bag (OOB) samples, skipping trees where samples are in-bag. This approach was motivated <a href=https://github.com/scikit-learn/scikit-learn/issues/1802 target=_blank rel=noopener>by a bug in scikit-learn</a>, where the OOB loss estimate was calculated on the in-bag samples, meaning that it always improved (and thus was useless for the purpose of setting the number of iterations).</p><p>The bug in scikit-learn was <a href=https://github.com/scikit-learn/scikit-learn/pull/2188 target=_blank rel=noopener>fixed</a> by porting the solution used in <a href=http://cran.r-project.org/web/packages/gbm/index.html target=_blank rel=noopener>R&rsquo;s GBM package</a>, where the number of iterations is estimated by minimising the improvement on the OOB samples in each boosting iteration. This approach is known to <a href=http://cran.open-source-solution.org/web/packages/gbm/vignettes/gbm.pdf target=_blank rel=noopener>underestimate the number of required iterations</a>, which means that it&rsquo;s not very useful in practice. This underestimation may be due to the fact that the GBM method is partly estimated on in-bag samples, as the OOB samples for the Nth iteration are likely to have been in-bag in previous iterations.</p><p>I was curious about how my approach compares to the GBM method. Preliminary results on the <a href=http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_oob.html target=_blank rel=noopener>toy dataset from scikit-learn&rsquo;s documentation</a> looked promising:</p><figure><a href=gradient-boosting-out-of-bag-experiment-toy-dataset.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="data science,gradient boosting,machine learning,predictive modelling,scikit-learn"><meta name=description content="Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Stochastic Gradient Boosting: Choosing the Best Number of Iterations"><meta property="og:description" content="Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-12-29T02:30:06+00:00"><meta property="article:modified_time" content="2023-07-06T09:28:02+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Stochastic Gradient Boosting: Choosing the Best Number of Iterations"><meta name=twitter:description content="Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Stochastic Gradient Boosting: Choosing the Best Number of Iterations","item":"https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Stochastic Gradient Boosting: Choosing the Best Number of Iterations","name":"Stochastic Gradient Boosting: Choosing the Best Number of Iterations","description":"Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.","keywords":["data science","gradient boosting","machine learning","predictive modelling","scikit-learn"],"articleBody":"In my summary of the Kaggle bulldozer price forecasting competition, I mentioned that part of my solution was based on stochastic gradient boosting. To reduce runtime, the number of boosting iterations was set by minimising the loss on the out-of-bag (OOB) samples, skipping trees where samples are in-bag. This approach was motivated by a bug in scikit-learn, where the OOB loss estimate was calculated on the in-bag samples, meaning that it always improved (and thus was useless for the purpose of setting the number of iterations).\nThe bug in scikit-learn was fixed by porting the solution used in R’s GBM package, where the number of iterations is estimated by minimising the improvement on the OOB samples in each boosting iteration. This approach is known to underestimate the number of required iterations, which means that it’s not very useful in practice. This underestimation may be due to the fact that the GBM method is partly estimated on in-bag samples, as the OOB samples for the Nth iteration are likely to have been in-bag in previous iterations.\nI was curious about how my approach compares to the GBM method. Preliminary results on the toy dataset from scikit-learn’s documentation looked promising:\nMy approach (TSO) beat both 5-fold cross-validation (CV) and the GBM/scikit-learn method (SKO), as TSO obtains its minimum at the closest number of iterations to the test set’s (T) optimal value.\nThe next step in testing TSO’s viability was to rerun Ridgeway’s experiments from Section 3.3 of the GBM documentation (R code here). I used the same 12 UCI datasets that Ridgeway used, running 5×2 cross-validation on each one. For each dataset, the score was obtained by dividing the mean loss of the best method on the dataset by the loss of each method. Hence, all scores are between 0.0 and 1.0, with the best score being 1.0. The following figure summarises the results on the 12 datasets.\nThe following table shows the raw data that was used to produce the figure.\nDataset CV SKO TSO creditrating 0.9962 0.9771 1 breastcancer 1 0.6675 0.4869 mushrooms 0.9588 0.9963 1 abalone 1 0.9754 0.9963 ionosphere 0.9919 1 0.8129 diabetes 1 0.9869 0.9985 autoprices 1 0.9565 0.5839 autompg 1 0.8753 0.9948 bostonhousing 1 0.8299 0.5412 haberman 1 0.9793 0.9266 cpuperformance 0.9934 0.9160 1 adult 1 0.9824 0.9991 The main finding is that CV remains the most reliable approach. Even when CV is not the best-performing method, it’s not much worse than the best method (this is in line with Ridgeway’s findings). TSO yielded the best results on 3/12 of the datasets, and beat SKO 7/12 times. However, TSO’s results are the most variant of the three methods: when it fails, it often yields very poor results.\nIn conclusion, stick to cross-validation for the best results. It’s more computationally intensive than SKO and TSO, but can be parallelised. I still think that there may be a way to avoid cross-validation, perhaps by extending SKO/TSO in more intelligent ways (see some interesting ideas by Eugene Dubossarsky here and here). Any comments/ideas are very welcome.\n","wordCount":"507","inLanguage":"en","datePublished":"2014-12-29T02:30:06Z","dateModified":"2023-07-06T09:28:02+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Stochastic Gradient Boosting: Choosing the Best Number of Iterations</h1><div class=post-meta><span title='2014-12-29 02:30:06 +0000 UTC'>December 29, 2014</span></div></header><div class=post-content><p>In my <a href=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/ title="Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)">summary of the Kaggle bulldozer price forecasting competition</a>, I mentioned that part of my solution was based on stochastic gradient boosting. To reduce runtime, the number of boosting iterations was set by minimising the loss on the out-of-bag (OOB) samples, skipping trees where samples are in-bag. This approach was motivated <a href=https://github.com/scikit-learn/scikit-learn/issues/1802 target=_blank rel=noopener>by a bug in scikit-learn</a>, where the OOB loss estimate was calculated on the in-bag samples, meaning that it always improved (and thus was useless for the purpose of setting the number of iterations).</p><p>The bug in scikit-learn was <a href=https://github.com/scikit-learn/scikit-learn/pull/2188 target=_blank rel=noopener>fixed</a> by porting the solution used in <a href=http://cran.r-project.org/web/packages/gbm/index.html target=_blank rel=noopener>R&rsquo;s GBM package</a>, where the number of iterations is estimated by minimising the improvement on the OOB samples in each boosting iteration. This approach is known to <a href=http://cran.open-source-solution.org/web/packages/gbm/vignettes/gbm.pdf target=_blank rel=noopener>underestimate the number of required iterations</a>, which means that it&rsquo;s not very useful in practice. This underestimation may be due to the fact that the GBM method is partly estimated on in-bag samples, as the OOB samples for the Nth iteration are likely to have been in-bag in previous iterations.</p><p>I was curious about how my approach compares to the GBM method. Preliminary results on the <a href=http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_oob.html target=_blank rel=noopener>toy dataset from scikit-learn&rsquo;s documentation</a> looked promising:</p><figure><a href=gradient-boosting-out-of-bag-experiment-toy-dataset.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/gradient-boosting-out-of-bag-experiment-toy-dataset_hu02dc1ebe47af12a7ec8f5877429b5dec_71277_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/gradient-boosting-out-of-bag-experiment-toy-dataset_hu02dc1ebe47af12a7ec8f5877429b5dec_71277_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/gradient-boosting-out-of-bag-experiment-toy-dataset_hu02dc1ebe47af12a7ec8f5877429b5dec_71277_720x0_resize_box_3.png 720w,
diff --git a/2015/01/15/automating-parse-com-bulk-data-imports/index.html b/2015/01/15/automating-parse-com-bulk-data-imports/index.html
index 2ce64808b..67a41fa9e 100644
--- a/2015/01/15/automating-parse-com-bulk-data-imports/index.html
+++ b/2015/01/15/automating-parse-com-bulk-data-imports/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Automating Parse.com bulk data imports | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="DevOps,parse.com,PhantomJS,software engineering"><meta name=description content="A script for importing data into the Parse backend-as-a-service."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Automating Parse.com bulk data imports"><meta property="og:description" content="A script for importing data into the Parse backend-as-a-service."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/"><meta property="og:image" content="https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-01-15T04:41:16+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting.jpg"><meta name=twitter:title content="Automating Parse.com bulk data imports"><meta name=twitter:description content="A script for importing data into the Parse backend-as-a-service."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Automating Parse.com bulk data imports","item":"https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Automating Parse.com bulk data imports","name":"Automating Parse.com bulk data imports","description":"A script for importing data into the Parse backend-as-a-service.","keywords":["DevOps","parse.com","PhantomJS","software engineering"],"articleBody":"Parse is a great backend-as-a-service (BaaS) product. It removes much of the hassle involved in backend devops with its web hosting service, SDKs for all the major mobile platforms, and a generous free tier. Parse does have its share of flaws, including various reliability issues (which seem to be getting rarer), and limitations on what you can do (which is reasonable price to pay for working within a sandboxed environment). One such limitation is the lack of APIs to perform bulk data imports. This post introduces my workaround for this limitation (tl;dr: it’s a PhantomJS script).\nUpdate: The script no longer works due to changes to Parse’s website. I won’t be fixing it since I’ve migrated my projects off the platform. If you fix it, let me know and I’ll post a link to the updated script here.\nI use Parse for two of my projects: BCRecommender and Price Dingo. In both cases, some of the data is generated outside Parse by a Python backend. Doing all the data processing within Parse is not a viable option, so a solution for importing this data into Parse is required.\nMy original solution for data import was using the Parse REST API via ParsePy. The problem with this solution is that Parse billing is done on a requests/second basis. The free tier includes 30 requests/second, so importing BCRecommender’s ~million objects takes about nine hours when operating at maximum capacity. However, operating at maximum capacity causes other client requests to be dropped (i.e., real users suffer). Hence, some sort of rate limiting is required, which makes the sync process take even longer.\nI thought that using batch requests would speed up the process, but it actually slowed it down! This is because batch requests are billed according to the number of sub-requests, so making even one successful batch request per second with the maximum number of sub-requests (50) causes more requests to be dropped. I implemented some code to retry failed requests, but the whole process was just too brittle.\nA few months ago I discovered that Parse supports bulk data import via the web interface (with no API support). This feature comes with the caveat that existing collections can’t be updated: a new collection must be created. This is actually a good thing, as it essentially makes the collections immutable. And immutability makes many things easier.\nBCRecommender data gets updated once a month, so I was happy with manually importing the data via the web interface. As a price comparison engine, Price Dingo’s data changes more frequently, so manual updates are out of the question. For Price Dingo to be hosted on Parse, I had to find a way to automate bulk imports. Some people suggest emulating the requests made by the web interface, but this requires relying on hardcoded cookie and CSRF token data, which may change at any time. A more robust solution would be to scriptify the manual actions, but how? PhantomJS, that’s how.\nI ended up implementing a PhantomJS script that logs in as the user and uploads a dump to a given collection. This script is available on GitHub Gist. To run it, simply install PhantomJS and run:\n$ phantomjs --ssl-protocol any \\ import-parse-class.js See the script’s source for a detailed explanation of the command-line arguments.\nIt is worth noting that the script doesn’t do any post-upload verification on the collection. This is done by an extra bit of Python code that verifies that the collection has the expected number of objects, and tries to query the collection sorted by all the keys that are supposed to be indexed (for large collections, it takes Parse a while to index all the fields, which may result in timeouts). Once these conditions are fulfilled, the Parse hosting code is updated to point to the new collection. For security, I added a bot user that has access only to the Parse app that it needs to update. Unlike the root user, this bot user can’t delete the app. As the config file contains the bot’s password, it should be encrypted and stored in a safe place (like the Parse master key).\nThat’s it! I hope that other people would find this solution useful. Any suggestions/comments/issues are very welcome.\nImage source: Parse Blog.\n","wordCount":"715","inLanguage":"en","image":"https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting.jpg","datePublished":"2015-01-15T04:41:16Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Automating Parse.com bulk data imports</h1><div class=post-meta><span title='2015-01-15 04:41:16 +0000 UTC'>January 15, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting_hu8a5ad3bca5f9ab53f157c3c03f7dc2d6_52832_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting_hu8a5ad3bca5f9ab53f157c3c03f7dc2d6_52832_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting_hu8a5ad3bca5f9ab53f157c3c03f7dc2d6_52832_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting_hu8a5ad3bca5f9ab53f157c3c03f7dc2d6_52832_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting.jpg 1080w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting.jpg alt width=1080 height=440></figure><div class=post-content><p><a href=http://parse.com target=_blank rel=noopener>Parse</a> is a great backend-as-a-service (BaaS) product. It removes much of the hassle involved in backend devops with its web hosting service, SDKs for all the major mobile platforms, and a generous free tier. Parse does have its share of flaws, including various reliability issues (which seem to be getting rarer), and limitations on what you can do (which is reasonable price to pay for working within a sandboxed environment). One such limitation is the lack of APIs to perform bulk data imports. This post introduces my workaround for this limitation (tl;dr: it&rsquo;s a <a href=https://gist.github.com/yanirs/eddedf152f42c1ee02b2 target=_blank rel=noopener>PhantomJS script</a>).</p><p><strong>Update:</strong> The script no longer works due to changes to Parse&rsquo;s website. I won&rsquo;t be fixing it since <a href=https://yanirseroussi.com/2015/07/31/goodbye-parse-com/>I&rsquo;ve migrated my projects off the platform</a>. If you fix it, let me know and I&rsquo;ll post a link to the updated script here.</p><p>I use Parse for two of my projects: <a title="Bandcamp recommendations based on your fan profile" href=http://www.bcrecommender.com target=_blank rel=noopener>BCRecommender</a> and Price Dingo. In both cases, some of the data is <a href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/ title="Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)">generated outside Parse by a Python backend</a>. Doing all the data processing within Parse is not a viable option, so a solution for importing this data into Parse is required.</p><p>My original solution for data import was using the Parse REST API via <a href=https://github.com/dgrtwo/ParsePy target=_blank rel=noopener>ParsePy</a>. The problem with this solution is that Parse billing is done on a requests/second basis. The free tier includes 30 requests/second, so importing BCRecommender&rsquo;s ~million objects takes about nine hours when operating at maximum capacity. However, operating at maximum capacity causes other client requests to be dropped (i.e., real users suffer). Hence, some sort of rate limiting is required, which makes the sync process take even longer.</p><p>I thought that using <a href=https://parse.com/docs/rest#objects-batch target=_blank rel=noopener>batch requests</a> would speed up the process, but it actually slowed it down! This is because batch requests are billed according to the number of sub-requests, so making even one successful batch request per second with the maximum number of sub-requests (50) causes more requests to be dropped. I implemented some code to retry failed requests, but the whole process was just too brittle.</p><p>A few months ago I discovered that Parse supports <a href=https://parse.com/docs/data#data-import target=_blank rel=noopener>bulk data import via the web interface</a> (with no API support). This feature comes with the caveat that existing collections can&rsquo;t be updated: a new collection must be created. This is actually a good thing, as it essentially makes the collections immutable. And <a href=http://en.wikipedia.org/wiki/Immutable_object target=_blank rel=noopener>immutability makes many things easier</a>.</p><p>BCRecommender data gets updated once a month, so I was happy with manually importing the data via the web interface. As a price comparison engine, Price Dingo&rsquo;s data changes more frequently, so manual updates are out of the question. For Price Dingo to be hosted on Parse, I had to find a way to automate bulk imports. Some people suggest <a href=https://www.parse.com/questions/programmatically-create-classes-import-json target=_blank rel=noopener>emulating the requests made by the web interface</a>, but this requires relying on hardcoded cookie and CSRF token data, which may change at any time. A more robust solution would be to scriptify the manual actions, but how? <a href=http://phantomjs.org/ target=_blank rel=noopener>PhantomJS</a>, that&rsquo;s how.</p><p>I ended up implementing a PhantomJS script that logs in as the user and uploads a dump to a given collection. This script is <a href=https://gist.github.com/yanirs/eddedf152f42c1ee02b2 target=_blank rel=noopener>available on GitHub Gist</a>. To run it, simply install PhantomJS and run:</p><div class=highlight><pre tabindex=0 style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-bash data-lang=bash><span style=display:flex><span>$ phantomjs --ssl-protocol any <span style=color:#ae81ff>\
+<meta name=keywords content="DevOps,parse.com,PhantomJS,software engineering"><meta name=description content="A script for importing data into the Parse backend-as-a-service."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Automating Parse.com bulk data imports"><meta property="og:description" content="A script for importing data into the Parse backend-as-a-service."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/"><meta property="og:image" content="https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-01-15T04:41:16+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting.jpg"><meta name=twitter:title content="Automating Parse.com bulk data imports"><meta name=twitter:description content="A script for importing data into the Parse backend-as-a-service."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Automating Parse.com bulk data imports","item":"https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Automating Parse.com bulk data imports","name":"Automating Parse.com bulk data imports","description":"A script for importing data into the Parse backend-as-a-service.","keywords":["DevOps","parse.com","PhantomJS","software engineering"],"articleBody":"Parse is a great backend-as-a-service (BaaS) product. It removes much of the hassle involved in backend devops with its web hosting service, SDKs for all the major mobile platforms, and a generous free tier. Parse does have its share of flaws, including various reliability issues (which seem to be getting rarer), and limitations on what you can do (which is reasonable price to pay for working within a sandboxed environment). One such limitation is the lack of APIs to perform bulk data imports. This post introduces my workaround for this limitation (tl;dr: it’s a PhantomJS script).\nUpdate: The script no longer works due to changes to Parse’s website. I won’t be fixing it since I’ve migrated my projects off the platform. If you fix it, let me know and I’ll post a link to the updated script here.\nI use Parse for two of my projects: BCRecommender and Price Dingo. In both cases, some of the data is generated outside Parse by a Python backend. Doing all the data processing within Parse is not a viable option, so a solution for importing this data into Parse is required.\nMy original solution for data import was using the Parse REST API via ParsePy. The problem with this solution is that Parse billing is done on a requests/second basis. The free tier includes 30 requests/second, so importing BCRecommender’s ~million objects takes about nine hours when operating at maximum capacity. However, operating at maximum capacity causes other client requests to be dropped (i.e., real users suffer). Hence, some sort of rate limiting is required, which makes the sync process take even longer.\nI thought that using batch requests would speed up the process, but it actually slowed it down! This is because batch requests are billed according to the number of sub-requests, so making even one successful batch request per second with the maximum number of sub-requests (50) causes more requests to be dropped. I implemented some code to retry failed requests, but the whole process was just too brittle.\nA few months ago I discovered that Parse supports bulk data import via the web interface (with no API support). This feature comes with the caveat that existing collections can’t be updated: a new collection must be created. This is actually a good thing, as it essentially makes the collections immutable. And immutability makes many things easier.\nBCRecommender data gets updated once a month, so I was happy with manually importing the data via the web interface. As a price comparison engine, Price Dingo’s data changes more frequently, so manual updates are out of the question. For Price Dingo to be hosted on Parse, I had to find a way to automate bulk imports. Some people suggest emulating the requests made by the web interface, but this requires relying on hardcoded cookie and CSRF token data, which may change at any time. A more robust solution would be to scriptify the manual actions, but how? PhantomJS, that’s how.\nI ended up implementing a PhantomJS script that logs in as the user and uploads a dump to a given collection. This script is available on GitHub Gist. To run it, simply install PhantomJS and run:\n$ phantomjs --ssl-protocol any \\ import-parse-class.js See the script’s source for a detailed explanation of the command-line arguments.\nIt is worth noting that the script doesn’t do any post-upload verification on the collection. This is done by an extra bit of Python code that verifies that the collection has the expected number of objects, and tries to query the collection sorted by all the keys that are supposed to be indexed (for large collections, it takes Parse a while to index all the fields, which may result in timeouts). Once these conditions are fulfilled, the Parse hosting code is updated to point to the new collection. For security, I added a bot user that has access only to the Parse app that it needs to update. Unlike the root user, this bot user can’t delete the app. As the config file contains the bot’s password, it should be encrypted and stored in a safe place (like the Parse master key).\nThat’s it! I hope that other people would find this solution useful. Any suggestions/comments/issues are very welcome.\nImage source: Parse Blog.\n","wordCount":"715","inLanguage":"en","image":"https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting.jpg","datePublished":"2015-01-15T04:41:16Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Automating Parse.com bulk data imports</h1><div class=post-meta><span title='2015-01-15 04:41:16 +0000 UTC'>January 15, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting_hu8a5ad3bca5f9ab53f157c3c03f7dc2d6_52832_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting_hu8a5ad3bca5f9ab53f157c3c03f7dc2d6_52832_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting_hu8a5ad3bca5f9ab53f157c3c03f7dc2d6_52832_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting_hu8a5ad3bca5f9ab53f157c3c03f7dc2d6_52832_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting.jpg 1080w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/parse-hosting.jpg alt width=1080 height=440></figure><div class=post-content><p><a href=http://parse.com target=_blank rel=noopener>Parse</a> is a great backend-as-a-service (BaaS) product. It removes much of the hassle involved in backend devops with its web hosting service, SDKs for all the major mobile platforms, and a generous free tier. Parse does have its share of flaws, including various reliability issues (which seem to be getting rarer), and limitations on what you can do (which is reasonable price to pay for working within a sandboxed environment). One such limitation is the lack of APIs to perform bulk data imports. This post introduces my workaround for this limitation (tl;dr: it&rsquo;s a <a href=https://gist.github.com/yanirs/eddedf152f42c1ee02b2 target=_blank rel=noopener>PhantomJS script</a>).</p><p><strong>Update:</strong> The script no longer works due to changes to Parse&rsquo;s website. I won&rsquo;t be fixing it since <a href=https://yanirseroussi.com/2015/07/31/goodbye-parse-com/>I&rsquo;ve migrated my projects off the platform</a>. If you fix it, let me know and I&rsquo;ll post a link to the updated script here.</p><p>I use Parse for two of my projects: <a title="Bandcamp recommendations based on your fan profile" href=http://www.bcrecommender.com target=_blank rel=noopener>BCRecommender</a> and Price Dingo. In both cases, some of the data is <a href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/ title="Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)">generated outside Parse by a Python backend</a>. Doing all the data processing within Parse is not a viable option, so a solution for importing this data into Parse is required.</p><p>My original solution for data import was using the Parse REST API via <a href=https://github.com/dgrtwo/ParsePy target=_blank rel=noopener>ParsePy</a>. The problem with this solution is that Parse billing is done on a requests/second basis. The free tier includes 30 requests/second, so importing BCRecommender&rsquo;s ~million objects takes about nine hours when operating at maximum capacity. However, operating at maximum capacity causes other client requests to be dropped (i.e., real users suffer). Hence, some sort of rate limiting is required, which makes the sync process take even longer.</p><p>I thought that using <a href=https://parse.com/docs/rest#objects-batch target=_blank rel=noopener>batch requests</a> would speed up the process, but it actually slowed it down! This is because batch requests are billed according to the number of sub-requests, so making even one successful batch request per second with the maximum number of sub-requests (50) causes more requests to be dropped. I implemented some code to retry failed requests, but the whole process was just too brittle.</p><p>A few months ago I discovered that Parse supports <a href=https://parse.com/docs/data#data-import target=_blank rel=noopener>bulk data import via the web interface</a> (with no API support). This feature comes with the caveat that existing collections can&rsquo;t be updated: a new collection must be created. This is actually a good thing, as it essentially makes the collections immutable. And <a href=http://en.wikipedia.org/wiki/Immutable_object target=_blank rel=noopener>immutability makes many things easier</a>.</p><p>BCRecommender data gets updated once a month, so I was happy with manually importing the data via the web interface. As a price comparison engine, Price Dingo&rsquo;s data changes more frequently, so manual updates are out of the question. For Price Dingo to be hosted on Parse, I had to find a way to automate bulk imports. Some people suggest <a href=https://www.parse.com/questions/programmatically-create-classes-import-json target=_blank rel=noopener>emulating the requests made by the web interface</a>, but this requires relying on hardcoded cookie and CSRF token data, which may change at any time. A more robust solution would be to scriptify the manual actions, but how? <a href=http://phantomjs.org/ target=_blank rel=noopener>PhantomJS</a>, that&rsquo;s how.</p><p>I ended up implementing a PhantomJS script that logs in as the user and uploads a dump to a given collection. This script is <a href=https://gist.github.com/yanirs/eddedf152f42c1ee02b2 target=_blank rel=noopener>available on GitHub Gist</a>. To run it, simply install PhantomJS and run:</p><div class=highlight><pre tabindex=0 style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-bash data-lang=bash><span style=display:flex><span>$ phantomjs --ssl-protocol any <span style=color:#ae81ff>\
 </span></span></span><span style=display:flex><span><span style=color:#ae81ff></span>    import-parse-class.js &lt;configFile&gt; &lt;dumpFile&gt; &lt;collectionName&gt;
 </span></span></code></pre></div><p><a href=https://gist.github.com/yanirs/eddedf152f42c1ee02b2 target=_blank rel=noopener>See the script&rsquo;s source</a> for a detailed explanation of the command-line arguments.</p><p>It is worth noting that the script doesn&rsquo;t do any post-upload verification on the collection. This is done by an extra bit of Python code that verifies that the collection has the expected number of objects, and tries to query the collection sorted by all the keys that are supposed to be indexed (for large collections, it takes Parse a while to index all the fields, which may result in timeouts). Once these conditions are fulfilled, the Parse hosting code is updated to point to the new collection. For security, I added a bot user that has access only to the Parse app that it needs to update. Unlike the root user, this bot user can&rsquo;t delete the app. As the config file contains the bot&rsquo;s password, it should be encrypted and stored in a safe place (<a href=https://parse.com/docs/data#security target=_blank rel=noopener>like the Parse master key</a>).</p><p>That&rsquo;s it! I hope that other people would find this solution useful. Any suggestions/comments/issues are very welcome.</p><p><small><br>Image source: <a href=http://blog.parse.com/2013/05/07/goodbye-web-servers-hello-parse-hosting/ target=_blank rel=noopener>Parse Blog</a>.<br></small></p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/devops/>DevOps</a></li><li><a href=https://yanirseroussi.com/tags/parse.com/>Parse.com</a></li><li><a href=https://yanirseroussi.com/tags/phantomjs/>PhantomJS</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Automating Parse.com bulk data imports on x" href="https://x.com/intent/tweet/?text=Automating%20Parse.com%20bulk%20data%20imports&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f01%2f15%2fautomating-parse-com-bulk-data-imports%2f&amp;hashtags=DevOps%2cparse.com%2cPhantomJS%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Automating Parse.com bulk data imports on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f01%2f15%2fautomating-parse-com-bulk-data-imports%2f&amp;title=Automating%20Parse.com%20bulk%20data%20imports&amp;summary=Automating%20Parse.com%20bulk%20data%20imports&amp;source=https%3a%2f%2fyanirseroussi.com%2f2015%2f01%2f15%2fautomating-parse-com-bulk-data-imports%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Automating Parse.com bulk data imports on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2015%2f01%2f15%2fautomating-parse-com-bulk-data-imports%2f&title=Automating%20Parse.com%20bulk%20data%20imports"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Automating Parse.com bulk data imports on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2015%2f01%2f15%2fautomating-parse-com-bulk-data-imports%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Automating Parse.com bulk data imports on whatsapp" href="https://api.whatsapp.com/send?text=Automating%20Parse.com%20bulk%20data%20imports%20-%20https%3a%2f%2fyanirseroussi.com%2f2015%2f01%2f15%2fautomating-parse-com-bulk-data-imports%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Automating Parse.com bulk data imports on telegram" href="https://telegram.me/share/url?text=Automating%20Parse.com%20bulk%20data%20imports&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f01%2f15%2fautomating-parse-com-bulk-data-imports%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Automating Parse.com bulk data imports on ycombinator" href="https://news.ycombinator.com/submitlink?t=Automating%20Parse.com%20bulk%20data%20imports&u=https%3a%2f%2fyanirseroussi.com%2f2015%2f01%2f15%2fautomating-parse-com-bulk-data-imports%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
diff --git a/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/index.html b/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/index.html
index 50713f682..50571f58b 100644
--- a/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/index.html
+++ b/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1) | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data science,Kaggle,Kaggle competition,machine learning,predictive modelling,search engine optimisation"><meta name=description content="Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams)."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)"><meta property="og:description" content="Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams)."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/"><meta property="og:image" content="https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-01-29T10:37:39+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence.jpg"><meta name=twitter:title content="Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)"><meta name=twitter:description content="Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams)."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)","item":"https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)","name":"Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)","description":"Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).","keywords":["data science","Kaggle","Kaggle competition","machine learning","predictive modelling","search engine optimisation"],"articleBody":"About a year ago, I participated in the Yandex search personalisation Kaggle competition. I started off as a solo competitor, and then added a few Kaggle newbies to the team as part of a program I was running for the Sydney Data Science Meetup. My team hasn’t done too badly, finishing 9th out of 194 teams. As is usually the case with Kaggle competitions, the most valuable part was the lessons learned from the experience. In this case, the lessons go beyond the usual data science skills, and include some insights that are relevant to search engine optimisation (SEO) and privacy. This post describes the competition setup and covers the more general insights. A follow-up post will cover the technical side of our approach.\nThe data Yandex is the leading search engine in Russia. For the competition, they supplied a dataset that consists of log data of search activity from a single large city, which represents one month of search activity (excluding popular queries). In total, the dataset contains about 21M unique queries, 700M unique urls, 6M unique users, and 35M search sessions. This is a relatively-big dataset for a Kaggle competition (the training file is about 16GB uncompressed), but it’s really rather small in comparison to Yandex’s overall search volume and tiny compared to what Google handles.\nThe data was anonymised, so a sample looks like this (see full description of the data format – the example and its description are taken from there):\n744899 M 23 123123123 744899 0 Q 0 192902 4857,3847,2939 632428,2384 309585,28374 319567,38724 6547,28744 20264,2332 3094446,34535 90,21 841,231 8344,2342 119571,45767 744899 1403 C 0 632428 These records describe the session (SessionID = 744899) of the user with USERID 123123123, performed on the 23rd day of the dataset. The user submitted the query with QUERYID 192902, which contains terms with TermIDs 4857,3847,2939. The URL with URLID 632428 placed on the domain DomainID 2384 is the top result on the corresponding SERP. 1403 units of time after beginning of the session the user clicked on the result with URLID 632428 (ranked first in the list).\nWhile this may seem daunting at first, the data is actually quite simple. For each search session, we know the user, the queries they’ve made, which URLs and domains were returned in the SERP (search engine result page), which results they’ve clicked, and at what point in time the queries and clicks happened.\nGoal and evaluation The goal of the competition is to rerank the results in each SERP such that the highest-ranking documents are those that the user would find most relevant. As the name of the competition suggests, personalising the results is key, but non-personalised approaches were also welcome (and actually worked quite well).\nOne question that arises is how to tell from this data which results the user finds relevant. In this competition, the results were labelled as either irrelevant (0), relevant (1), or highly relevant (2). Relevance is a function of clicks and dwell time, where dwell time is the time spent on the result (determined by the time that passed until the next query or click). Irrelevant results are ones that weren’t clicked, or those for which the dwell time is less than 50 (the time unit is left unspecified). Relevant results are those that were clicked and have dwell time of 50 to 399. Highly relevant results have dwell time of at least 400, or were clicked as the last action in the session (i.e., it is assumed the user finished the session satisfied with the results rather than left because they couldn’t find what they were looking for).\nThis approach to determining relevance has some obvious flaws, but it apparently correlates well with actual user satisfaction with search results.\nGiven the above definition of relevance, one can quantify how well a reranking method improves the relevance of the results. For this competition, the organisers chose the normalised discounted cumulative gain (NDCG) measure, which is a fancy name for a measure that, in the words of Wikipedia, encodes the assumptions that:\nHighly relevant documents are more useful when appearing earlier in a search engine result list (have higher ranks) Highly relevant documents are more useful than marginally relevant documents, which are in turn more useful than irrelevant documents. SEO insights and other thoughts A key insight that is relevant to SEO and privacy, is that even without considering browser-based tracking and tools like Google Analytics (which may or may not be used by Google to rerank search results), search engines can infer a lot about user behaviour on other sites, just based on user interaction with the SERP. So if your users bounce quickly because your website is slow to load or ranks highly for irrelevant queries, the search engine can know that, and will probably penalise you accordingly.\nThis works both ways, though, and is evident even on search engines that don’t track personal information. Just try searching for “f” or “fa” or “fac” using DuckDuckGo, Google, Bing, Yahoo, or even Yandex. Facebook will be one of the top results (most often the first one), probably just because people tend to search for or visit Facebook after searching for one of those terms by mistake. So if your website ranks poorly for a term for which it should rank well, and your users behave accordingly (because, for example, they’re searching for your website specifically), you may magically end up with better ranking without any changes to inbound links or to your site.\nAnother thing that is demonstrated by this competition’s dataset is just how much data search engines consider when determining ranking. The dataset is just a sample of logs for one city for one month. I don’t like throwing the words “big data” around, but the full volume of data is pretty big. Too big for anyone to grasp and fully understand how exactly search engines work, and this includes the people who build them. What’s worth keeping in mind is that for all major search engines, the user is the product that they sell to advertisers, so keeping the users happy is key. Any changes made to the underlying algorithms are usually done with the end-user in mind, because not making such changes may kill the search engine (remember AltaVista?). Further, personalisation means that different users see different results for the same query. So my feeling is that it’s somewhat futile to do any SEO beyond making the website understandable by search engines, acquiring legitimate links, and just building a website that people would want to visit.\nNext steps With those thoughts out of the way, it’s time to describe the way we addressed the challenge. This is covered in the next post, Learning to rank for personalised search.\n","wordCount":"1126","inLanguage":"en","image":"https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence.jpg","datePublished":"2015-01-29T10:37:39Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)</h1><div class=post-meta><span title='2015-01-29 10:37:39 +0000 UTC'>January 29, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence_hucf29f125477380947c76df29ad469af8_251570_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence_hucf29f125477380947c76df29ad469af8_251570_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence_hucf29f125477380947c76df29ad469af8_251570_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence_hucf29f125477380947c76df29ad469af8_251570_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence_hucf29f125477380947c76df29ad469af8_251570_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence.jpg 1568w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence.jpg alt width=1568 height=1051></figure><div class=post-content><p>About a year ago, I participated in the <a href=https://www.kaggle.com/c/yandex-personalized-web-search-challenge target=_blank rel=noopener>Yandex search personalisation Kaggle competition</a>. I started off as a solo competitor, and then added a few Kaggle newbies to the team as part of a program I was running for the <a href=http://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>Sydney Data Science Meetup</a>. My team hasn&rsquo;t done too badly, finishing 9th out of 194 teams. As is usually the case with Kaggle competitions, the most valuable part was the lessons learned from the experience. In this case, the lessons go beyond the usual data science skills, and include some insights that are relevant to search engine optimisation (SEO) and privacy. This post describes the competition setup and covers the more general insights. A follow-up post will cover the technical side of our approach.</p><h3 id=the-data>The data<a hidden class=anchor aria-hidden=true href=#the-data>#</a></h3><p>Yandex is the leading search engine in Russia. For the competition, they <a href=https://www.kaggle.com/c/yandex-personalized-web-search-challenge/data target=_blank rel=noopener>supplied a dataset</a> that consists of log data of search activity from a single large city, which represents one month of search activity (excluding popular queries). In total, the dataset contains about 21M unique queries, 700M unique urls, 6M unique users, and 35M search sessions. This is a relatively-big dataset for a Kaggle competition (the training file is about 16GB uncompressed), but it&rsquo;s really rather small in comparison to Yandex&rsquo;s overall search volume and tiny compared to what Google handles.</p><p>The data was anonymised, so a sample looks like this (see <a href=https://www.kaggle.com/c/yandex-personalized-web-search-challenge/details/logs-format target=_blank rel=noopener>full description of the data format</a> – the example and its description are taken from there):</p><blockquote><div class=highlight><pre tabindex=0 style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-text data-lang=text><span style=display:flex><span>744899 M 23 123123123
+<meta name=keywords content="data science,Kaggle,Kaggle competition,machine learning,predictive modelling,search engine optimisation"><meta name=description content="Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams)."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)"><meta property="og:description" content="Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams)."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/"><meta property="og:image" content="https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-01-29T10:37:39+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence.jpg"><meta name=twitter:title content="Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)"><meta name=twitter:description content="Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams)."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)","item":"https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)","name":"Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)","description":"Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).","keywords":["data science","Kaggle","Kaggle competition","machine learning","predictive modelling","search engine optimisation"],"articleBody":"About a year ago, I participated in the Yandex search personalisation Kaggle competition. I started off as a solo competitor, and then added a few Kaggle newbies to the team as part of a program I was running for the Sydney Data Science Meetup. My team hasn’t done too badly, finishing 9th out of 194 teams. As is usually the case with Kaggle competitions, the most valuable part was the lessons learned from the experience. In this case, the lessons go beyond the usual data science skills, and include some insights that are relevant to search engine optimisation (SEO) and privacy. This post describes the competition setup and covers the more general insights. A follow-up post will cover the technical side of our approach.\nThe data Yandex is the leading search engine in Russia. For the competition, they supplied a dataset that consists of log data of search activity from a single large city, which represents one month of search activity (excluding popular queries). In total, the dataset contains about 21M unique queries, 700M unique urls, 6M unique users, and 35M search sessions. This is a relatively-big dataset for a Kaggle competition (the training file is about 16GB uncompressed), but it’s really rather small in comparison to Yandex’s overall search volume and tiny compared to what Google handles.\nThe data was anonymised, so a sample looks like this (see full description of the data format – the example and its description are taken from there):\n744899 M 23 123123123 744899 0 Q 0 192902 4857,3847,2939 632428,2384 309585,28374 319567,38724 6547,28744 20264,2332 3094446,34535 90,21 841,231 8344,2342 119571,45767 744899 1403 C 0 632428 These records describe the session (SessionID = 744899) of the user with USERID 123123123, performed on the 23rd day of the dataset. The user submitted the query with QUERYID 192902, which contains terms with TermIDs 4857,3847,2939. The URL with URLID 632428 placed on the domain DomainID 2384 is the top result on the corresponding SERP. 1403 units of time after beginning of the session the user clicked on the result with URLID 632428 (ranked first in the list).\nWhile this may seem daunting at first, the data is actually quite simple. For each search session, we know the user, the queries they’ve made, which URLs and domains were returned in the SERP (search engine result page), which results they’ve clicked, and at what point in time the queries and clicks happened.\nGoal and evaluation The goal of the competition is to rerank the results in each SERP such that the highest-ranking documents are those that the user would find most relevant. As the name of the competition suggests, personalising the results is key, but non-personalised approaches were also welcome (and actually worked quite well).\nOne question that arises is how to tell from this data which results the user finds relevant. In this competition, the results were labelled as either irrelevant (0), relevant (1), or highly relevant (2). Relevance is a function of clicks and dwell time, where dwell time is the time spent on the result (determined by the time that passed until the next query or click). Irrelevant results are ones that weren’t clicked, or those for which the dwell time is less than 50 (the time unit is left unspecified). Relevant results are those that were clicked and have dwell time of 50 to 399. Highly relevant results have dwell time of at least 400, or were clicked as the last action in the session (i.e., it is assumed the user finished the session satisfied with the results rather than left because they couldn’t find what they were looking for).\nThis approach to determining relevance has some obvious flaws, but it apparently correlates well with actual user satisfaction with search results.\nGiven the above definition of relevance, one can quantify how well a reranking method improves the relevance of the results. For this competition, the organisers chose the normalised discounted cumulative gain (NDCG) measure, which is a fancy name for a measure that, in the words of Wikipedia, encodes the assumptions that:\nHighly relevant documents are more useful when appearing earlier in a search engine result list (have higher ranks) Highly relevant documents are more useful than marginally relevant documents, which are in turn more useful than irrelevant documents. SEO insights and other thoughts A key insight that is relevant to SEO and privacy, is that even without considering browser-based tracking and tools like Google Analytics (which may or may not be used by Google to rerank search results), search engines can infer a lot about user behaviour on other sites, just based on user interaction with the SERP. So if your users bounce quickly because your website is slow to load or ranks highly for irrelevant queries, the search engine can know that, and will probably penalise you accordingly.\nThis works both ways, though, and is evident even on search engines that don’t track personal information. Just try searching for “f” or “fa” or “fac” using DuckDuckGo, Google, Bing, Yahoo, or even Yandex. Facebook will be one of the top results (most often the first one), probably just because people tend to search for or visit Facebook after searching for one of those terms by mistake. So if your website ranks poorly for a term for which it should rank well, and your users behave accordingly (because, for example, they’re searching for your website specifically), you may magically end up with better ranking without any changes to inbound links or to your site.\nAnother thing that is demonstrated by this competition’s dataset is just how much data search engines consider when determining ranking. The dataset is just a sample of logs for one city for one month. I don’t like throwing the words “big data” around, but the full volume of data is pretty big. Too big for anyone to grasp and fully understand how exactly search engines work, and this includes the people who build them. What’s worth keeping in mind is that for all major search engines, the user is the product that they sell to advertisers, so keeping the users happy is key. Any changes made to the underlying algorithms are usually done with the end-user in mind, because not making such changes may kill the search engine (remember AltaVista?). Further, personalisation means that different users see different results for the same query. So my feeling is that it’s somewhat futile to do any SEO beyond making the website understandable by search engines, acquiring legitimate links, and just building a website that people would want to visit.\nNext steps With those thoughts out of the way, it’s time to describe the way we addressed the challenge. This is covered in the next post, Learning to rank for personalised search.\n","wordCount":"1126","inLanguage":"en","image":"https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence.jpg","datePublished":"2015-01-29T10:37:39Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)</h1><div class=post-meta><span title='2015-01-29 10:37:39 +0000 UTC'>January 29, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence_hucf29f125477380947c76df29ad469af8_251570_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence_hucf29f125477380947c76df29ad469af8_251570_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence_hucf29f125477380947c76df29ad469af8_251570_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence_hucf29f125477380947c76df29ad469af8_251570_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence_hucf29f125477380947c76df29ad469af8_251570_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence.jpg 1568w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/artificial-intelligence.jpg alt width=1568 height=1051></figure><div class=post-content><p>About a year ago, I participated in the <a href=https://www.kaggle.com/c/yandex-personalized-web-search-challenge target=_blank rel=noopener>Yandex search personalisation Kaggle competition</a>. I started off as a solo competitor, and then added a few Kaggle newbies to the team as part of a program I was running for the <a href=http://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>Sydney Data Science Meetup</a>. My team hasn&rsquo;t done too badly, finishing 9th out of 194 teams. As is usually the case with Kaggle competitions, the most valuable part was the lessons learned from the experience. In this case, the lessons go beyond the usual data science skills, and include some insights that are relevant to search engine optimisation (SEO) and privacy. This post describes the competition setup and covers the more general insights. A follow-up post will cover the technical side of our approach.</p><h3 id=the-data>The data<a hidden class=anchor aria-hidden=true href=#the-data>#</a></h3><p>Yandex is the leading search engine in Russia. For the competition, they <a href=https://www.kaggle.com/c/yandex-personalized-web-search-challenge/data target=_blank rel=noopener>supplied a dataset</a> that consists of log data of search activity from a single large city, which represents one month of search activity (excluding popular queries). In total, the dataset contains about 21M unique queries, 700M unique urls, 6M unique users, and 35M search sessions. This is a relatively-big dataset for a Kaggle competition (the training file is about 16GB uncompressed), but it&rsquo;s really rather small in comparison to Yandex&rsquo;s overall search volume and tiny compared to what Google handles.</p><p>The data was anonymised, so a sample looks like this (see <a href=https://www.kaggle.com/c/yandex-personalized-web-search-challenge/details/logs-format target=_blank rel=noopener>full description of the data format</a> – the example and its description are taken from there):</p><blockquote><div class=highlight><pre tabindex=0 style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-text data-lang=text><span style=display:flex><span>744899 M 23 123123123
 </span></span><span style=display:flex><span>744899 0 Q 0 192902 4857,3847,2939 632428,2384 309585,28374 319567,38724 6547,28744 20264,2332 3094446,34535 90,21 841,231 8344,2342 119571,45767
 </span></span><span style=display:flex><span>744899 1403 C 0 632428
 </span></span></code></pre></div><p>These records describe the session (<code>SessionID</code> = 744899) of the user with <code>USERID</code> 123123123, performed on the 23rd day of the dataset. The user submitted the query with <code>QUERYID</code> 192902, which contains terms with <code>TermIDs</code> 4857,3847,2939. The URL with <code>URLID</code> 632428 placed on the domain <code>DomainID</code> 2384 is the top result on the corresponding SERP. 1403 units of time after beginning of the session the user clicked on the result with <code>URLID</code> 632428 (ranked first in the list).</p></blockquote><p>While this may seem daunting at first, the data is actually quite simple. For each search session, we know the user, the queries they&rsquo;ve made, which URLs and domains were returned in the SERP (search engine result page), which results they&rsquo;ve clicked, and at what point in time the queries and clicks happened.</p><h3 id=goal-and-evaluation>Goal and evaluation<a hidden class=anchor aria-hidden=true href=#goal-and-evaluation>#</a></h3><p>The goal of the competition is to rerank the results in each SERP such that the highest-ranking documents are those that the user would find most relevant. As the name of the competition suggests, personalising the results is key, but non-personalised approaches were also welcome (and actually worked quite well).</p><p>One question that arises is how to tell from this data which results the user finds relevant. In this competition, the results were labelled as either irrelevant (0), relevant (1), or highly relevant (2). Relevance is a function of clicks and dwell time, where dwell time is the time spent on the result (determined by the time that passed until the next query or click). Irrelevant results are ones that weren&rsquo;t clicked, or those for which the dwell time is less than 50 (the time unit is left unspecified). Relevant results are those that were clicked and have dwell time of 50 to 399. Highly relevant results have dwell time of at least 400, or were clicked as the last action in the session (i.e., it is assumed the user finished the session satisfied with the results rather than left because they couldn&rsquo;t find what they were looking for).</p><p>This approach to determining relevance has some obvious flaws, but <a href=https://www.kaggle.com/c/yandex-personalized-web-search-challenge/details/evaluation target=_blank rel=noopener>it apparently correlates well with actual user satisfaction with search results</a>.</p><p>Given the above definition of relevance, one can quantify how well a reranking method improves the relevance of the results. For this competition, the organisers chose the <a href=https://en.wikipedia.org/wiki/Discounted_cumulative_gain target=_blank rel=noopener>normalised discounted cumulative gain (NDCG) measure</a>, which is a fancy name for a measure that, in the words of Wikipedia, encodes the assumptions that:</p><ul><li>Highly relevant documents are more useful when appearing earlier in a search engine result list (have higher ranks)</li><li>Highly relevant documents are more useful than marginally relevant documents, which are in turn more useful than irrelevant documents.</li></ul><h3 id=seo-insights-and-other-thoughts>SEO insights and other thoughts<a hidden class=anchor aria-hidden=true href=#seo-insights-and-other-thoughts>#</a></h3><p>A key insight that is relevant to SEO and privacy, is that even without considering browser-based tracking and tools like Google Analytics (which may or may not be used by Google to rerank search results), search engines can infer a lot about user behaviour on other sites, just based on user interaction with the SERP. So if your users bounce quickly because your website is slow to load or ranks highly for irrelevant queries, the search engine can know that, and will probably penalise you accordingly.</p><p>This works both ways, though, and is evident even on <a href=http://donttrack.us/ target=_blank rel=noopener>search engines that don&rsquo;t track personal information</a>. Just try searching for &ldquo;f&rdquo; or &ldquo;fa&rdquo; or &ldquo;fac&rdquo; using DuckDuckGo, Google, Bing, Yahoo, or even Yandex. Facebook will be one of the top results (most often the first one), probably just because people tend to search for or visit Facebook after searching for one of those terms by mistake. So if your website ranks poorly for a term for which it should rank well, and your users behave accordingly (because, for example, they&rsquo;re searching for your website specifically), you may magically end up with better ranking without any changes to inbound links or to your site.</p><p>Another thing that is demonstrated by this competition&rsquo;s dataset is just how much data search engines consider when determining ranking. The dataset is just a sample of logs for one city for one month. I don&rsquo;t like throwing the words &ldquo;big data&rdquo; around, but the full volume of data is pretty big. Too big for anyone to grasp and fully understand how exactly search engines work, and this includes the people who build them. What&rsquo;s worth keeping in mind is that for all major search engines, the user is the product that they sell to advertisers, so keeping the users happy is key. Any changes made to the underlying algorithms are usually done with the end-user in mind, because not making such changes may kill the search engine (remember AltaVista?). Further, personalisation means that <a href=http://dontbubble.us/ target=_blank rel=noopener>different users see different results for the same query</a>. So my feeling is that it&rsquo;s somewhat futile to do any SEO beyond making the website understandable by search engines, acquiring legitimate links, and just building a website that people would want to visit.</p><h3 id=next-steps>Next steps<a hidden class=anchor aria-hidden=true href=#next-steps>#</a></h3><p>With those thoughts out of the way, it&rsquo;s time to describe the way we addressed the challenge. This is covered in the next post, <a href=https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/ title="Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)">Learning to rank for personalised search</a>.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/kaggle/>Kaggle</a></li><li><a href=https://yanirseroussi.com/tags/kaggle-competition/>Kaggle Competition</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/predictive-modelling/>Predictive Modelling</a></li><li><a href=https://yanirseroussi.com/tags/search-engine-optimisation/>Search Engine Optimisation</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1) on x" href="https://x.com/intent/tweet/?text=Is%20thinking%20like%20a%20search%20engine%20possible%3f%20%28Yandex%20search%20personalisation%20%e2%80%93%20Kaggle%20competition%20summary%20%e2%80%93%20part%201%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f01%2f29%2fis-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1%2f&amp;hashtags=datascience%2cKaggle%2cKagglecompetition%2cmachinelearning%2cpredictivemodelling%2csearchengineoptimisation"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1) on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f01%2f29%2fis-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1%2f&amp;title=Is%20thinking%20like%20a%20search%20engine%20possible%3f%20%28Yandex%20search%20personalisation%20%e2%80%93%20Kaggle%20competition%20summary%20%e2%80%93%20part%201%29&amp;summary=Is%20thinking%20like%20a%20search%20engine%20possible%3f%20%28Yandex%20search%20personalisation%20%e2%80%93%20Kaggle%20competition%20summary%20%e2%80%93%20part%201%29&amp;source=https%3a%2f%2fyanirseroussi.com%2f2015%2f01%2f29%2fis-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1) on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2015%2f01%2f29%2fis-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1%2f&title=Is%20thinking%20like%20a%20search%20engine%20possible%3f%20%28Yandex%20search%20personalisation%20%e2%80%93%20Kaggle%20competition%20summary%20%e2%80%93%20part%201%29"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1) on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2015%2f01%2f29%2fis-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1) on whatsapp" href="https://api.whatsapp.com/send?text=Is%20thinking%20like%20a%20search%20engine%20possible%3f%20%28Yandex%20search%20personalisation%20%e2%80%93%20Kaggle%20competition%20summary%20%e2%80%93%20part%201%29%20-%20https%3a%2f%2fyanirseroussi.com%2f2015%2f01%2f29%2fis-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1) on telegram" href="https://telegram.me/share/url?text=Is%20thinking%20like%20a%20search%20engine%20possible%3f%20%28Yandex%20search%20personalisation%20%e2%80%93%20Kaggle%20competition%20summary%20%e2%80%93%20part%201%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f01%2f29%2fis-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1) on ycombinator" href="https://news.ycombinator.com/submitlink?t=Is%20thinking%20like%20a%20search%20engine%20possible%3f%20%28Yandex%20search%20personalisation%20%e2%80%93%20Kaggle%20competition%20summary%20%e2%80%93%20part%201%29&u=https%3a%2f%2fyanirseroussi.com%2f2015%2f01%2f29%2fis-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
diff --git a/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/index.html b/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/index.html
index 0ab7c3bab..d9dd95c53 100644
--- a/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/index.html
+++ b/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data science,gradient boosting,Kaggle,Kaggle competition,machine learning,predictive modelling,search engine optimisation"><meta name=description content="My team&rsquo;s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams)."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)"><meta property="og:description" content="My team&rsquo;s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams)."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/"><meta property="og:image" content="https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/rating.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-02-11T06:34:17+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/rating.png"><meta name=twitter:title content="Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)"><meta name=twitter:description content="My team&rsquo;s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams)."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)","item":"https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)","name":"Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)","description":"My team\u0026rsquo;s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).","keywords":["data science","gradient boosting","Kaggle","Kaggle competition","machine learning","predictive modelling","search engine optimisation"],"articleBody":"This is the second and last post summarising my team’s solution for the Yandex search personalisation Kaggle competition. See the first post for a summary of the dataset, evaluation approach, and some thoughts about search engine optimisation and privacy. This post discusses the algorithms and features we used.\nTo quickly recap the first post, Yandex released a 16GB dataset of query \u0026 click logs. The goal of the competition was to use this data to rerank query results such that the more relevant results appear before less relevant results. Relevance is determined by time spent on each clicked result (non-clicked results are deemed irrelevant), and overall performance is scored using the normalised discounted cumulative gain (NDCG) measure. No data about the content of sites or queries was given – each query in the dataset is a list of token IDs and each result is a (url ID, domain ID) pair.\nFirst steps: memory-based heuristics My initial approach wasn’t very exciting: it involved iterating through the data, summarising it in one way or another, and assigning new relevance scores to each (user, session, query) combination. In this early stage I also implemented an offline validation framework, which is an important part of every Kaggle competition: in this case I simply set aside the last three days of data for local testing, because the test dataset that was used for the leaderboard consisted of three days of log data.\nSomewhat surprisingly, my heuristics worked quite well and put me in a top-10 position on the leaderboard. It seems like the barrier of entry for this competition was higher than for other Kaggle competitions due to the size of the data and the fact that it wasn’t given as preprocessed feature vectors. This was evident from questions on the forum, where people noted that they were having trouble downloading and looking at the data.\nThe heuristic models that worked well included:\nReranking based on mean relevance (this just swapped positions 9 \u0026 10, probably because users are more likely to click the last result) Reranking based on mean relevance for (query, url) and (query, domain) pairs (non-personalised improvements) Downranking urls observed previously in a session Each one of the heuristic models was set to output relevance scores. The models were then ensembled by simply summing the relevance scores.\nThen, I started playing with a collaborative-filtering-inspired matrix factorisation model for predicting relevance, which didn’t work too well. At around that time, I got too busy with other stuff and decided to quit while I’m ahead.\nGetting more serious with some team work and LambdaMART A few weeks after quitting, I somehow volunteered to organise Kaggle teams for newbies at the Sydney Data Science Meetup group. At that point I was joined by my teammates, which served as a good motivation to do more stuff.\nThe first thing we tried was another heuristic model I read about in one of the papers suggested by the organisers: just reranking based on the fact that people often repeat queries as a navigational aid (e.g., search for Facebook and click Facebook). Combined in a simple linear model with the other heuristics, this put us at #4. Too easy 🙂\nWith all the new motivation, it was time to read more papers and start doing things properly. We ended up using Ranklib’s LambdaMART implementation as one of our main models, and also used LambdaMART to combine the various models (the old heuristics still helped the overall score, as did the matrix factorisation model).\nUsing LambdaMART made it possible to directly optimise the NDCG measure, turning the key problem into feature engineering, i.e., finding good features to feed into the model. Explaining how LambdaMART works is beyond the scope of this post (see this paper for an in-depth discussion), but the basic idea (which is also shared by other learning to rank algorithms) is that rather than trying to solve the hard problem of predicting relevance (i.e., a regression problem), the algorithm tries to predict the ranking that yields the best results according to a user-chosen measure.\nWe tried many features for the LambdaMART model, but after feature selection (using a method learned from Phil Brierley’s talk) the best features turned out to be:\npercentage_recurrent_term_ids: percentage of term IDs from the test query that appeared previously in the session — indicates if this query refines previous queries query_mean_ndcg: historical NDCG for this query — indicates how satisfied people are with the results of this query. Interestingly, we also tried query click entropy, but it performed worse. Probably because we’re optimising the NDCG rather than click-through rate. query_num_unique_serps: how many different result pages were shown for this query query_mean_result_dwell_time: how much time on average people spend per result for this query user_mean_ndcg: like query_mean_ndcg, but for users — a low NDCG indicates that this user is likely to be dissatisfied with the results. As for query_mean_ndcg, adding this feature yielded better results than using the user’s click entropy. user_num_click_actions_with_relevance_0: over the history of this user, how many of their clicks had relevance 0 (i.e., short dwell time). Interestingly, user_num_click_actions_with_relevance_1 and user_num_click_actions_with_relevance_2 were found to be less useful. user_num_query_actions: number of queries performed by the user rank: the original rank, as assigned by Yandex previous_query_url_relevance_in_session: modelling repeated results within a session, e.g., if a (query, url) pair was already found irrelevant in this session, the user may not want to see it again previous_url_relevance_in_session: the same as previous_query_url_relevance_in_session, but for a url regardless of the query user_query_url_relevance_sum: over the entire history of the user, not just the session user_normalised_rank_relevance: how relevant does the user usually find this rank? The idea is that some people are more likely to go through all the results than others query_url_click_probability: estimated simply as num_query_url_clicks / num_query_url_occurrences (across all the users) average_time_on_page: how much time people spend on this url on average Our best submission ended up placing us at the 9th place (out of 194 teams), which is respectable. Things got a bit more interesting towards the end of the competition – if we had used the original heuristic model that put at #4 early on, we would have finished 18th.\nConclusion I really enjoyed this competition. The data was well-organised and well-defined, which is not something you get in every competition (or in “real life”). Its size did present some challenges, but we stuck to using flat files and some preprocessing and other tricks to speed things up (e.g., I got to use Cython for the first time). It was good to learn how learning to rank algorithms work and get some insights on search personalisation. As is often the case with Kaggle competitions, this was time well spent.\n","wordCount":"1114","inLanguage":"en","image":"https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/rating.png","datePublished":"2015-02-11T06:34:17Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)</h1><div class=post-meta><span title='2015-02-11 06:34:17 +0000 UTC'>February 11, 2015</span></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/rating.png alt></figure><div class=post-content><p>This is the second and last post summarising my team&rsquo;s solution for the <a href=https://www.kaggle.com/c/yandex-personalized-web-search-challenge target=_blank rel=noopener>Yandex search personalisation Kaggle competition</a>. <a href=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/ title="Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)">See the first post</a> for a summary of the dataset, evaluation approach, and some thoughts about search engine optimisation and privacy. This post discusses the algorithms and features we used.</p><p>To quickly recap the <a href=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/ title="Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)">first post</a>, Yandex released a 16GB dataset of query & click logs. The goal of the competition was to use this data to rerank query results such that the more relevant results appear before less relevant results. Relevance is determined by time spent on each clicked result (non-clicked results are deemed irrelevant), and overall performance is scored using the <a href=https://en.wikipedia.org/wiki/Discounted_cumulative_gain target=_blank rel=noopener>normalised discounted cumulative gain (NDCG) measure</a>. No data about the content of sites or queries was given – each query in the dataset is a list of token IDs and each result is a (url ID, domain ID) pair.</p><h3 id=first-steps-memory-based-heuristics>First steps: memory-based heuristics<a hidden class=anchor aria-hidden=true href=#first-steps-memory-based-heuristics>#</a></h3><p>My initial approach wasn&rsquo;t very exciting: it involved iterating through the data, summarising it in one way or another, and assigning new relevance scores to each (user, session, query) combination. In this early stage I also implemented an offline validation framework, <a href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/ title="How to (almost) win Kaggle competitions">which is an important part of every Kaggle competition</a>: in this case I simply set aside the last three days of data for local testing, because the test dataset that was used for the leaderboard consisted of three days of log data.</p><p>Somewhat surprisingly, my heuristics worked quite well and put me in a top-10 position on the leaderboard. It seems like the barrier of entry for this competition was higher than for other Kaggle competitions due to the size of the data and the fact that it wasn&rsquo;t given as preprocessed feature vectors. This was evident from questions on the forum, where people noted that they were having trouble downloading and looking at the data.</p><p>The heuristic models that worked well included:</p><ul><li>Reranking based on mean relevance (this just swapped positions 9 & 10, probably because users are more likely to click the last result)</li><li>Reranking based on mean relevance for (query, url) and (query, domain) pairs (non-personalised improvements)</li><li>Downranking urls observed previously in a session</li></ul><p>Each one of the heuristic models was set to output relevance scores. The models were then ensembled by simply summing the relevance scores.</p><p>Then, I started playing with a <a href=https://en.wikipedia.org/wiki/Collaborative_filtering target=_blank rel=noopener>collaborative-filtering</a>-inspired matrix factorisation model for predicting relevance, which didn&rsquo;t work too well. At around that time, I got too busy with other stuff and decided to quit while I&rsquo;m ahead.</p><h3 id=getting-more-serious-with-some-team-work-and-lambdamart>Getting more serious with some team work and LambdaMART<a hidden class=anchor aria-hidden=true href=#getting-more-serious-with-some-team-work-and-lambdamart>#</a></h3><p>A few weeks after quitting, I somehow volunteered to organise Kaggle teams for newbies at the <a href=http://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>Sydney Data Science Meetup group</a>. At that point I was joined by my teammates, which served as a good motivation to do more stuff.</p><p>The first thing we tried was another heuristic model I read about in one of the <a href=https://www.kaggle.com/c/yandex-personalized-web-search-challenge/details/related-papers target=_blank rel=noopener>papers suggested by the organisers</a>: just reranking based on the fact that people often repeat queries as a navigational aid (e.g., search for Facebook and click Facebook). Combined in a simple linear model with the other heuristics, this put us at #4. Too easy 🙂</p><p>With all the new motivation, it was time to read more papers and start doing things properly. We ended up using <a href=http://sourceforge.net/p/lemur/wiki/RankLib/ target=_blank rel=noopener>Ranklib&rsquo;s LambdaMART implementation</a> as one of our main models, and also used LambdaMART to combine the various models (the old heuristics still helped the overall score, as did the matrix factorisation model).</p><p>Using LambdaMART made it possible to directly optimise the NDCG measure, turning the key problem into feature engineering, i.e., finding good features to feed into the model. Explaining how LambdaMART works is beyond the scope of this post (<a href=http://research.microsoft.com/pubs/132652/MSR-TR-2010-82.pdf target=_blank rel=noopener>see this paper for an in-depth discussion</a>), but the basic idea (which is also shared by other <a href=https://en.wikipedia.org/wiki/Learning_to_rank target=_blank rel=noopener>learning to rank</a> algorithms) is that rather than trying to solve the hard problem of predicting relevance (i.e., a regression problem), the algorithm tries to predict the ranking that yields the best results according to a user-chosen measure.</p><p>We tried many features for the LambdaMART model, but after feature selection (using a method learned from <a href=http://anotherdataminingblog.blogspot.com.au/2013/10/techniques-to-improve-accuracy-of-your_17.html target=_blank rel=noopener>Phil Brierley&rsquo;s talk</a>) the best features turned out to be:</p><ul><li>percentage_recurrent_term_ids: percentage of term IDs from the test query that appeared previously in the session — indicates if this query refines previous queries</li><li>query_mean_ndcg: historical NDCG for this query — indicates how satisfied people are with the results of this query. Interestingly, we also tried query click entropy, but it performed worse. Probably because we&rsquo;re optimising the NDCG rather than click-through rate.</li><li>query_num_unique_serps: how many different result pages were shown for this query</li><li>query_mean_result_dwell_time: how much time on average people spend per result for this query</li><li>user_mean_ndcg: like query_mean_ndcg, but for users — a low NDCG indicates that this user is likely to be dissatisfied with the results. As for query_mean_ndcg, adding this feature yielded better results than using the user&rsquo;s click entropy.</li><li>user_num_click_actions_with_relevance_0: over the history of this user, how many of their clicks had relevance 0 (i.e., short dwell time). Interestingly, user_num_click_actions_with_relevance_1 and user_num_click_actions_with_relevance_2 were found to be less useful.</li><li>user_num_query_actions: number of queries performed by the user</li><li>rank: the original rank, as assigned by Yandex</li><li>previous_query_url_relevance_in_session: modelling repeated results within a session, e.g., if a (query, url) pair was already found irrelevant in this session, the user may not want to see it again</li><li>previous_url_relevance_in_session: the same as previous_query_url_relevance_in_session, but for a url regardless of the query</li><li>user_query_url_relevance_sum: over the entire history of the user, not just the session</li><li>user_normalised_rank_relevance: how relevant does the user usually find this rank? The idea is that some people are more likely to go through all the results than others</li><li>query_url_click_probability: estimated simply as num_query_url_clicks / num_query_url_occurrences (across all the users)</li><li>average_time_on_page: how much time people spend on this url on average</li></ul><p>Our best submission ended up placing us at the 9th place (out of 194 teams), which is respectable. Things got a bit more interesting towards the end of the competition – if we had used the original heuristic model that put at #4 early on, we would have finished 18th.</p><h3 id=conclusion>Conclusion<a hidden class=anchor aria-hidden=true href=#conclusion>#</a></h3><p>I really enjoyed this competition. The data was well-organised and well-defined, which is not something you get in every competition (or in &ldquo;real life&rdquo;). Its size did present some challenges, but we stuck to using flat files and some preprocessing and other tricks to speed things up (e.g., I got to use <a href=http://cython.org/ target=_blank rel=noopener>Cython</a> for the first time). It was good to learn how learning to rank algorithms work and get some insights on search personalisation. As is often the case with Kaggle competitions, this was time well spent.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/gradient-boosting/>Gradient Boosting</a></li><li><a href=https://yanirseroussi.com/tags/kaggle/>Kaggle</a></li><li><a href=https://yanirseroussi.com/tags/kaggle-competition/>Kaggle Competition</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/predictive-modelling/>Predictive Modelling</a></li><li><a href=https://yanirseroussi.com/tags/search-engine-optimisation/>Search Engine Optimisation</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) on x" href="https://x.com/intent/tweet/?text=Learning%20to%20rank%20for%20personalised%20search%20%28Yandex%20Search%20Personalisation%20%e2%80%93%20Kaggle%20Competition%20Summary%20%e2%80%93%20Part%202%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f02%2f11%2flearning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2%2f&amp;hashtags=datascience%2cgradientboosting%2cKaggle%2cKagglecompetition%2cmachinelearning%2cpredictivemodelling%2csearchengineoptimisation"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f02%2f11%2flearning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2%2f&amp;title=Learning%20to%20rank%20for%20personalised%20search%20%28Yandex%20Search%20Personalisation%20%e2%80%93%20Kaggle%20Competition%20Summary%20%e2%80%93%20Part%202%29&amp;summary=Learning%20to%20rank%20for%20personalised%20search%20%28Yandex%20Search%20Personalisation%20%e2%80%93%20Kaggle%20Competition%20Summary%20%e2%80%93%20Part%202%29&amp;source=https%3a%2f%2fyanirseroussi.com%2f2015%2f02%2f11%2flearning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2015%2f02%2f11%2flearning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2%2f&title=Learning%20to%20rank%20for%20personalised%20search%20%28Yandex%20Search%20Personalisation%20%e2%80%93%20Kaggle%20Competition%20Summary%20%e2%80%93%20Part%202%29"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2015%2f02%2f11%2flearning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) on whatsapp" href="https://api.whatsapp.com/send?text=Learning%20to%20rank%20for%20personalised%20search%20%28Yandex%20Search%20Personalisation%20%e2%80%93%20Kaggle%20Competition%20Summary%20%e2%80%93%20Part%202%29%20-%20https%3a%2f%2fyanirseroussi.com%2f2015%2f02%2f11%2flearning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) on telegram" href="https://telegram.me/share/url?text=Learning%20to%20rank%20for%20personalised%20search%20%28Yandex%20Search%20Personalisation%20%e2%80%93%20Kaggle%20Competition%20Summary%20%e2%80%93%20Part%202%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f02%2f11%2flearning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) on ycombinator" href="https://news.ycombinator.com/submitlink?t=Learning%20to%20rank%20for%20personalised%20search%20%28Yandex%20Search%20Personalisation%20%e2%80%93%20Kaggle%20Competition%20Summary%20%e2%80%93%20Part%202%29&u=https%3a%2f%2fyanirseroussi.com%2f2015%2f02%2f11%2flearning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="data science,gradient boosting,Kaggle,Kaggle competition,machine learning,predictive modelling,search engine optimisation"><meta name=description content="My team&rsquo;s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams)."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)"><meta property="og:description" content="My team&rsquo;s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams)."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/"><meta property="og:image" content="https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/rating.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-02-11T06:34:17+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/rating.png"><meta name=twitter:title content="Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)"><meta name=twitter:description content="My team&rsquo;s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams)."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)","item":"https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)","name":"Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)","description":"My team\u0026rsquo;s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).","keywords":["data science","gradient boosting","Kaggle","Kaggle competition","machine learning","predictive modelling","search engine optimisation"],"articleBody":"This is the second and last post summarising my team’s solution for the Yandex search personalisation Kaggle competition. See the first post for a summary of the dataset, evaluation approach, and some thoughts about search engine optimisation and privacy. This post discusses the algorithms and features we used.\nTo quickly recap the first post, Yandex released a 16GB dataset of query \u0026 click logs. The goal of the competition was to use this data to rerank query results such that the more relevant results appear before less relevant results. Relevance is determined by time spent on each clicked result (non-clicked results are deemed irrelevant), and overall performance is scored using the normalised discounted cumulative gain (NDCG) measure. No data about the content of sites or queries was given – each query in the dataset is a list of token IDs and each result is a (url ID, domain ID) pair.\nFirst steps: memory-based heuristics My initial approach wasn’t very exciting: it involved iterating through the data, summarising it in one way or another, and assigning new relevance scores to each (user, session, query) combination. In this early stage I also implemented an offline validation framework, which is an important part of every Kaggle competition: in this case I simply set aside the last three days of data for local testing, because the test dataset that was used for the leaderboard consisted of three days of log data.\nSomewhat surprisingly, my heuristics worked quite well and put me in a top-10 position on the leaderboard. It seems like the barrier of entry for this competition was higher than for other Kaggle competitions due to the size of the data and the fact that it wasn’t given as preprocessed feature vectors. This was evident from questions on the forum, where people noted that they were having trouble downloading and looking at the data.\nThe heuristic models that worked well included:\nReranking based on mean relevance (this just swapped positions 9 \u0026 10, probably because users are more likely to click the last result) Reranking based on mean relevance for (query, url) and (query, domain) pairs (non-personalised improvements) Downranking urls observed previously in a session Each one of the heuristic models was set to output relevance scores. The models were then ensembled by simply summing the relevance scores.\nThen, I started playing with a collaborative-filtering-inspired matrix factorisation model for predicting relevance, which didn’t work too well. At around that time, I got too busy with other stuff and decided to quit while I’m ahead.\nGetting more serious with some team work and LambdaMART A few weeks after quitting, I somehow volunteered to organise Kaggle teams for newbies at the Sydney Data Science Meetup group. At that point I was joined by my teammates, which served as a good motivation to do more stuff.\nThe first thing we tried was another heuristic model I read about in one of the papers suggested by the organisers: just reranking based on the fact that people often repeat queries as a navigational aid (e.g., search for Facebook and click Facebook). Combined in a simple linear model with the other heuristics, this put us at #4. Too easy 🙂\nWith all the new motivation, it was time to read more papers and start doing things properly. We ended up using Ranklib’s LambdaMART implementation as one of our main models, and also used LambdaMART to combine the various models (the old heuristics still helped the overall score, as did the matrix factorisation model).\nUsing LambdaMART made it possible to directly optimise the NDCG measure, turning the key problem into feature engineering, i.e., finding good features to feed into the model. Explaining how LambdaMART works is beyond the scope of this post (see this paper for an in-depth discussion), but the basic idea (which is also shared by other learning to rank algorithms) is that rather than trying to solve the hard problem of predicting relevance (i.e., a regression problem), the algorithm tries to predict the ranking that yields the best results according to a user-chosen measure.\nWe tried many features for the LambdaMART model, but after feature selection (using a method learned from Phil Brierley’s talk) the best features turned out to be:\npercentage_recurrent_term_ids: percentage of term IDs from the test query that appeared previously in the session — indicates if this query refines previous queries query_mean_ndcg: historical NDCG for this query — indicates how satisfied people are with the results of this query. Interestingly, we also tried query click entropy, but it performed worse. Probably because we’re optimising the NDCG rather than click-through rate. query_num_unique_serps: how many different result pages were shown for this query query_mean_result_dwell_time: how much time on average people spend per result for this query user_mean_ndcg: like query_mean_ndcg, but for users — a low NDCG indicates that this user is likely to be dissatisfied with the results. As for query_mean_ndcg, adding this feature yielded better results than using the user’s click entropy. user_num_click_actions_with_relevance_0: over the history of this user, how many of their clicks had relevance 0 (i.e., short dwell time). Interestingly, user_num_click_actions_with_relevance_1 and user_num_click_actions_with_relevance_2 were found to be less useful. user_num_query_actions: number of queries performed by the user rank: the original rank, as assigned by Yandex previous_query_url_relevance_in_session: modelling repeated results within a session, e.g., if a (query, url) pair was already found irrelevant in this session, the user may not want to see it again previous_url_relevance_in_session: the same as previous_query_url_relevance_in_session, but for a url regardless of the query user_query_url_relevance_sum: over the entire history of the user, not just the session user_normalised_rank_relevance: how relevant does the user usually find this rank? The idea is that some people are more likely to go through all the results than others query_url_click_probability: estimated simply as num_query_url_clicks / num_query_url_occurrences (across all the users) average_time_on_page: how much time people spend on this url on average Our best submission ended up placing us at the 9th place (out of 194 teams), which is respectable. Things got a bit more interesting towards the end of the competition – if we had used the original heuristic model that put at #4 early on, we would have finished 18th.\nConclusion I really enjoyed this competition. The data was well-organised and well-defined, which is not something you get in every competition (or in “real life”). Its size did present some challenges, but we stuck to using flat files and some preprocessing and other tricks to speed things up (e.g., I got to use Cython for the first time). It was good to learn how learning to rank algorithms work and get some insights on search personalisation. As is often the case with Kaggle competitions, this was time well spent.\n","wordCount":"1114","inLanguage":"en","image":"https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/rating.png","datePublished":"2015-02-11T06:34:17Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)</h1><div class=post-meta><span title='2015-02-11 06:34:17 +0000 UTC'>February 11, 2015</span></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/rating.png alt></figure><div class=post-content><p>This is the second and last post summarising my team&rsquo;s solution for the <a href=https://www.kaggle.com/c/yandex-personalized-web-search-challenge target=_blank rel=noopener>Yandex search personalisation Kaggle competition</a>. <a href=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/ title="Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)">See the first post</a> for a summary of the dataset, evaluation approach, and some thoughts about search engine optimisation and privacy. This post discusses the algorithms and features we used.</p><p>To quickly recap the <a href=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/ title="Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)">first post</a>, Yandex released a 16GB dataset of query & click logs. The goal of the competition was to use this data to rerank query results such that the more relevant results appear before less relevant results. Relevance is determined by time spent on each clicked result (non-clicked results are deemed irrelevant), and overall performance is scored using the <a href=https://en.wikipedia.org/wiki/Discounted_cumulative_gain target=_blank rel=noopener>normalised discounted cumulative gain (NDCG) measure</a>. No data about the content of sites or queries was given – each query in the dataset is a list of token IDs and each result is a (url ID, domain ID) pair.</p><h3 id=first-steps-memory-based-heuristics>First steps: memory-based heuristics<a hidden class=anchor aria-hidden=true href=#first-steps-memory-based-heuristics>#</a></h3><p>My initial approach wasn&rsquo;t very exciting: it involved iterating through the data, summarising it in one way or another, and assigning new relevance scores to each (user, session, query) combination. In this early stage I also implemented an offline validation framework, <a href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/ title="How to (almost) win Kaggle competitions">which is an important part of every Kaggle competition</a>: in this case I simply set aside the last three days of data for local testing, because the test dataset that was used for the leaderboard consisted of three days of log data.</p><p>Somewhat surprisingly, my heuristics worked quite well and put me in a top-10 position on the leaderboard. It seems like the barrier of entry for this competition was higher than for other Kaggle competitions due to the size of the data and the fact that it wasn&rsquo;t given as preprocessed feature vectors. This was evident from questions on the forum, where people noted that they were having trouble downloading and looking at the data.</p><p>The heuristic models that worked well included:</p><ul><li>Reranking based on mean relevance (this just swapped positions 9 & 10, probably because users are more likely to click the last result)</li><li>Reranking based on mean relevance for (query, url) and (query, domain) pairs (non-personalised improvements)</li><li>Downranking urls observed previously in a session</li></ul><p>Each one of the heuristic models was set to output relevance scores. The models were then ensembled by simply summing the relevance scores.</p><p>Then, I started playing with a <a href=https://en.wikipedia.org/wiki/Collaborative_filtering target=_blank rel=noopener>collaborative-filtering</a>-inspired matrix factorisation model for predicting relevance, which didn&rsquo;t work too well. At around that time, I got too busy with other stuff and decided to quit while I&rsquo;m ahead.</p><h3 id=getting-more-serious-with-some-team-work-and-lambdamart>Getting more serious with some team work and LambdaMART<a hidden class=anchor aria-hidden=true href=#getting-more-serious-with-some-team-work-and-lambdamart>#</a></h3><p>A few weeks after quitting, I somehow volunteered to organise Kaggle teams for newbies at the <a href=http://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>Sydney Data Science Meetup group</a>. At that point I was joined by my teammates, which served as a good motivation to do more stuff.</p><p>The first thing we tried was another heuristic model I read about in one of the <a href=https://www.kaggle.com/c/yandex-personalized-web-search-challenge/details/related-papers target=_blank rel=noopener>papers suggested by the organisers</a>: just reranking based on the fact that people often repeat queries as a navigational aid (e.g., search for Facebook and click Facebook). Combined in a simple linear model with the other heuristics, this put us at #4. Too easy 🙂</p><p>With all the new motivation, it was time to read more papers and start doing things properly. We ended up using <a href=http://sourceforge.net/p/lemur/wiki/RankLib/ target=_blank rel=noopener>Ranklib&rsquo;s LambdaMART implementation</a> as one of our main models, and also used LambdaMART to combine the various models (the old heuristics still helped the overall score, as did the matrix factorisation model).</p><p>Using LambdaMART made it possible to directly optimise the NDCG measure, turning the key problem into feature engineering, i.e., finding good features to feed into the model. Explaining how LambdaMART works is beyond the scope of this post (<a href=http://research.microsoft.com/pubs/132652/MSR-TR-2010-82.pdf target=_blank rel=noopener>see this paper for an in-depth discussion</a>), but the basic idea (which is also shared by other <a href=https://en.wikipedia.org/wiki/Learning_to_rank target=_blank rel=noopener>learning to rank</a> algorithms) is that rather than trying to solve the hard problem of predicting relevance (i.e., a regression problem), the algorithm tries to predict the ranking that yields the best results according to a user-chosen measure.</p><p>We tried many features for the LambdaMART model, but after feature selection (using a method learned from <a href=http://anotherdataminingblog.blogspot.com.au/2013/10/techniques-to-improve-accuracy-of-your_17.html target=_blank rel=noopener>Phil Brierley&rsquo;s talk</a>) the best features turned out to be:</p><ul><li>percentage_recurrent_term_ids: percentage of term IDs from the test query that appeared previously in the session — indicates if this query refines previous queries</li><li>query_mean_ndcg: historical NDCG for this query — indicates how satisfied people are with the results of this query. Interestingly, we also tried query click entropy, but it performed worse. Probably because we&rsquo;re optimising the NDCG rather than click-through rate.</li><li>query_num_unique_serps: how many different result pages were shown for this query</li><li>query_mean_result_dwell_time: how much time on average people spend per result for this query</li><li>user_mean_ndcg: like query_mean_ndcg, but for users — a low NDCG indicates that this user is likely to be dissatisfied with the results. As for query_mean_ndcg, adding this feature yielded better results than using the user&rsquo;s click entropy.</li><li>user_num_click_actions_with_relevance_0: over the history of this user, how many of their clicks had relevance 0 (i.e., short dwell time). Interestingly, user_num_click_actions_with_relevance_1 and user_num_click_actions_with_relevance_2 were found to be less useful.</li><li>user_num_query_actions: number of queries performed by the user</li><li>rank: the original rank, as assigned by Yandex</li><li>previous_query_url_relevance_in_session: modelling repeated results within a session, e.g., if a (query, url) pair was already found irrelevant in this session, the user may not want to see it again</li><li>previous_url_relevance_in_session: the same as previous_query_url_relevance_in_session, but for a url regardless of the query</li><li>user_query_url_relevance_sum: over the entire history of the user, not just the session</li><li>user_normalised_rank_relevance: how relevant does the user usually find this rank? The idea is that some people are more likely to go through all the results than others</li><li>query_url_click_probability: estimated simply as num_query_url_clicks / num_query_url_occurrences (across all the users)</li><li>average_time_on_page: how much time people spend on this url on average</li></ul><p>Our best submission ended up placing us at the 9th place (out of 194 teams), which is respectable. Things got a bit more interesting towards the end of the competition – if we had used the original heuristic model that put at #4 early on, we would have finished 18th.</p><h3 id=conclusion>Conclusion<a hidden class=anchor aria-hidden=true href=#conclusion>#</a></h3><p>I really enjoyed this competition. The data was well-organised and well-defined, which is not something you get in every competition (or in &ldquo;real life&rdquo;). Its size did present some challenges, but we stuck to using flat files and some preprocessing and other tricks to speed things up (e.g., I got to use <a href=http://cython.org/ target=_blank rel=noopener>Cython</a> for the first time). It was good to learn how learning to rank algorithms work and get some insights on search personalisation. As is often the case with Kaggle competitions, this was time well spent.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/gradient-boosting/>Gradient Boosting</a></li><li><a href=https://yanirseroussi.com/tags/kaggle/>Kaggle</a></li><li><a href=https://yanirseroussi.com/tags/kaggle-competition/>Kaggle Competition</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/predictive-modelling/>Predictive Modelling</a></li><li><a href=https://yanirseroussi.com/tags/search-engine-optimisation/>Search Engine Optimisation</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) on x" href="https://x.com/intent/tweet/?text=Learning%20to%20rank%20for%20personalised%20search%20%28Yandex%20Search%20Personalisation%20%e2%80%93%20Kaggle%20Competition%20Summary%20%e2%80%93%20Part%202%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f02%2f11%2flearning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2%2f&amp;hashtags=datascience%2cgradientboosting%2cKaggle%2cKagglecompetition%2cmachinelearning%2cpredictivemodelling%2csearchengineoptimisation"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f02%2f11%2flearning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2%2f&amp;title=Learning%20to%20rank%20for%20personalised%20search%20%28Yandex%20Search%20Personalisation%20%e2%80%93%20Kaggle%20Competition%20Summary%20%e2%80%93%20Part%202%29&amp;summary=Learning%20to%20rank%20for%20personalised%20search%20%28Yandex%20Search%20Personalisation%20%e2%80%93%20Kaggle%20Competition%20Summary%20%e2%80%93%20Part%202%29&amp;source=https%3a%2f%2fyanirseroussi.com%2f2015%2f02%2f11%2flearning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2015%2f02%2f11%2flearning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2%2f&title=Learning%20to%20rank%20for%20personalised%20search%20%28Yandex%20Search%20Personalisation%20%e2%80%93%20Kaggle%20Competition%20Summary%20%e2%80%93%20Part%202%29"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2015%2f02%2f11%2flearning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) on whatsapp" href="https://api.whatsapp.com/send?text=Learning%20to%20rank%20for%20personalised%20search%20%28Yandex%20Search%20Personalisation%20%e2%80%93%20Kaggle%20Competition%20Summary%20%e2%80%93%20Part%202%29%20-%20https%3a%2f%2fyanirseroussi.com%2f2015%2f02%2f11%2flearning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) on telegram" href="https://telegram.me/share/url?text=Learning%20to%20rank%20for%20personalised%20search%20%28Yandex%20Search%20Personalisation%20%e2%80%93%20Kaggle%20Competition%20Summary%20%e2%80%93%20Part%202%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f02%2f11%2flearning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) on ycombinator" href="https://news.ycombinator.com/submitlink?t=Learning%20to%20rank%20for%20personalised%20search%20%28Yandex%20Search%20Personalisation%20%e2%80%93%20Kaggle%20Competition%20Summary%20%e2%80%93%20Part%202%29&u=https%3a%2f%2fyanirseroussi.com%2f2015%2f02%2f11%2flearning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2015/03/22/the-long-road-to-a-lifestyle-business/index.html b/2015/03/22/the-long-road-to-a-lifestyle-business/index.html
index 5b0c6ffe1..a93091ba0 100644
--- a/2015/03/22/the-long-road-to-a-lifestyle-business/index.html
+++ b/2015/03/22/the-long-road-to-a-lifestyle-business/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>The long road to a lifestyle business | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,data science,personal"><meta name=description content="Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The long road to a lifestyle business"><meta property="og:description" content="Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/"><meta property="og:image" content="https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-03-22T09:43:47+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track.jpg"><meta name=twitter:title content="The long road to a lifestyle business"><meta name=twitter:description content="Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"The long road to a lifestyle business","item":"https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The long road to a lifestyle business","name":"The long road to a lifestyle business","description":"Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects.","keywords":["business","data science","personal"],"articleBody":"Almost a year ago, I left my last full-time job and decided to set on an independent path that includes data science consulting and work on my own projects. The ultimate goal is not to have to sell my time for money by generating enough passive income to live comfortably. My five main areas of focus are – in no particular order – personal branding \u0026 networking, data science contracting, Bandcamp Recommender, Price Dingo, and marine conservation. This post summarises what I’ve been doing in each of these five areas, including highlights and lowlights. So far, it’s way better than having a “real” job. I hope this post will help others who are on a similar journey (there seem to be more and more of us – I’d love to hear from you).\nPersonal branding \u0026 networking Finding clients requires considerably more work than finding a full-time job. As with job hunting, the ideal situation is where people come to you for help, rather than you chasing them. To this end, I’ve been networking a lot, giving talks, writing up posts and working on distributing them. It may be harder than getting a full-time job, but it’s also much more interesting.\nHighlights: going viral in China, getting a post featured in KDNuggets\nLowlights: not having enough time to write all the things and meet all the people\nData science contracting My goal with contracting/consulting is to have a steady income stream while working on my own projects. As my projects are small enough to be done only by me (with optional outsourcing to contractors), this means I have infinite runway to pursue them. While this is probably not the best way of building a Silicon Valley-style startup that is going to make the world a better place, many others have applied this approach to building a so-called lifestyle business, which is what I want to achieve.\nEarly on, I realised that doing full-on consulting would be too time consuming, as many clients expect full-time availability. In addition, constantly needing to find new clients means that not much time would be left for work on my own projects. What I really wanted was a stable part-time gig. The first one was with GetUp (who reached out to me following a workshop I gave at General Assembly), where I did some work on forecasting engagement and churn. In parallel, I went through the interview process at DuckDuckGo, which included delivering a piece of work to production. DuckDuckGo ended up wanting me to work full-time (like a few other companies), so last month I started a part-time (three days a week) contract at Commonwealth Bank. I joined a team of very strong data scientists – it looks like it’s going to be interesting.\nHighlights: seeing my DuckDuckGo work every time I search for a Python package, the work environment at GetUp\nLowlights: chasing leads that never eventuated\nBandcamp Recommender (BCRecommender) I’ve written a several posts about BCRecommender, my Bandcamp music recommendation project. While I’ve always treated it as a side-project, it’s been useful in learning how to get traction for a product. It now has thousands of monthly users, and is still growing. My goal for BCRecommender has changed from the original one of finding music for myself to growing it enough to be a noticeable source of traffic for Bandcamp, thereby helping artists and fans. Doing it in side-project mode can be a bit challenging at times (because I have so many other things to do and a long list of ideas to make the app better), but I’ve been making gradual progress and discovering a lot of great music in the process.\nHighlights: every time someone gives me positive feedback, every time I listen to music I found using BCRecommender\nLowlights: dealing with Parse issues and random errors\nPrice Dingo The inability to reliably compare prices for many types of products has been bothering me for a while. Unlike general web search, where the main providers rank results by relevance, most Australian price comparison engines still require merchants to pay to even have their products listed. This creates an obvious bias in the results. To address this bias, I created Price Dingo – a user-centric price comparison engine. It serves users with results they can trust by not requiring merchants to pay to have their products listed. Just like general web search engines, the main ranking factor is relevancy to the user. This relevancy is also achieved by implementing Price Dingo as a network of independent sites, each focused on a specific product category, with the first category being scuba diving gear.\nImplementing Price Dingo hasn’t been too hard – the main challenge has been finding the time to do it with all the other stuff I’ve been doing. There are still plenty of improvements to be made to the site, but now the main goal is to get enough traction to make ongoing time investment worthwhile. Judging by the experience of Booko’s founder, there is space in the market for niche price comparison sites and apps, so it is just a matter of execution.\nHighlights: being able to finally compare dive gear prices, the joys of integrating Algolia\nLowlights: extracting data from messy websites – I’ve seen some horrible things…\nMarine conservation The first thing I did after leaving my last job was go overseas for five weeks, which included a ten-day visit to Israel (rockets!) and three weeks of conservation diving with New Heaven Dive School in Thailand. Back in Sydney, I joined the Underwater Research Group of NSW, a dive club that’s involved in many marine conservation and research activities, including Reef Life Survey (RLS) and underwater cleanups. With URG, I’ve been diving more than before, and for a change, some of my dives actually do good. I’d love to do this kind of stuff full-time, but there’s a lot less money in getting people to do less stuff (i.e., conservation and sustainability) than in consuming more. The compromise for now is that a portion of Price Dingo’s scuba revenue goes to the Australian Marine Conservation Society, and the plan is to expand this to other charities as more categories are added. Update – May 2015: I decided that this compromise isn’t good enough for me, so I shut down Price Dingo to focus on projects that are more aligned with my values.\nHighlights: becoming a certified RLS diver, pretty much every dive\nLowlights: cutting my hand open by falling on rocks on the first day of diving in Thailand\nThe future So far, I’m pretty happy with this not-having-a-job-doing-my-own-thing business. According to The 1000 Day Rule, I still have a long way to go until I get the lifestyle I want. It may even take longer than 1000 days given my decision to not work full-time on a single profitable project, together with my tendency to take more time off than I would if I had a “real” job. But the beauty of this path is that there are no investors breathing down my neck or the feeling of mental rot that comes with a full-time job, so there’s really no rush and I can just enjoy the ride.\n","wordCount":"1202","inLanguage":"en","image":"https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track.jpg","datePublished":"2015-03-22T09:43:47Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">The long road to a lifestyle business</h1><div class=post-meta><span title='2015-03-22 09:43:47 +0000 UTC'>March 22, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track_hu48d06ef732b295416c5a71b75238e67b_1361225_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track_hu48d06ef732b295416c5a71b75238e67b_1361225_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track_hu48d06ef732b295416c5a71b75238e67b_1361225_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track_hu48d06ef732b295416c5a71b75238e67b_1361225_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track_hu48d06ef732b295416c5a71b75238e67b_1361225_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track.jpg 3450w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track.jpg alt width=3450 height=1730></figure><div class=post-content><p>Almost a year ago, I left my last full-time job and decided to set on an independent path that includes data science consulting and work on my own projects. The ultimate goal is not to <em>have</em> to sell my time for money by generating enough passive income to live comfortably. My five main areas of focus are – in no particular order – personal branding & networking, data science contracting, <a href=http://www.bcrecommender.com target=_blank rel=noopener>Bandcamp Recommender</a>, Price Dingo, and marine conservation. This post summarises what I&rsquo;ve been doing in each of these five areas, including highlights and lowlights. So far, it&rsquo;s way better than having a &ldquo;real&rdquo; job. I hope this post will help others who are on a similar journey (there seem to be more and more of us – I&rsquo;d love to hear from you).</p><h3 id=personal-branding--networking>Personal branding & networking<a hidden class=anchor aria-hidden=true href=#personal-branding--networking>#</a></h3><p>Finding clients requires considerably more work than finding a full-time job. As with job hunting, the ideal situation is where people come to you for help, rather than you chasing them. To this end, I&rsquo;ve been networking a lot, giving talks, writing up posts and working on distributing them. It may be harder than getting a full-time job, but it&rsquo;s also much more interesting.</p><p><strong>Highlights:</strong> <a href=http://www.weibo.com/1497035431/BDl53rXDk target=_blank rel=noopener>going viral in China</a>, <a href=http://www.kdnuggets.com/2015/03/10-steps-success-kaggle-data-science-competitions.html target=_blank rel=noopener>getting a post featured in KDNuggets</a><br><strong>Lowlights:</strong> not having enough time to write all the things and meet all the people</p><h3 id=data-science-contracting>Data science contracting<a hidden class=anchor aria-hidden=true href=#data-science-contracting>#</a></h3><p>My goal with contracting/consulting is to have a steady income stream while working on my own projects. As my projects are small enough to be done only by me (with optional outsourcing to contractors), this means I have infinite runway to pursue them. While this is probably not the best way of building a Silicon Valley-style startup that is going to <a href="https://www.youtube.com/watch?v=J-GVd_HLlps" target=_blank rel=noopener>make the world a better place</a>, many others have applied this approach to building a so-called lifestyle business, which is what I want to achieve.</p><p>Early on, I realised that doing full-on consulting would be too time consuming, as many clients expect full-time availability. In addition, constantly needing to find new clients means that not much time would be left for work on my own projects. What I really wanted was a stable part-time gig. The first one was with <a href=https://www.getup.org.au/ target=_blank rel=noopener>GetUp</a> (who reached out to me following a workshop I gave at <a href=https://generalassemb.ly/education/demystifying-data-an-introduction-to-data-science target=_blank rel=noopener>General Assembly</a>), where I did some work on forecasting engagement and churn. In parallel, I went through the interview process at <a href=https://duckduckgo.com/ target=_blank rel=noopener>DuckDuckGo</a>, which included <a href=https://github.com/duckduckgo/zeroclickinfo-fathead/pull/95 target=_blank rel=noopener>delivering a piece of work to production</a>. DuckDuckGo ended up wanting me to work full-time (like a few other companies), so last month I started a part-time (three days a week) contract at <a href=https://www.commbank.com.au/ target=_blank rel=noopener>Commonwealth Bank</a>. I joined a team of very strong data scientists – it looks like it&rsquo;s going to be interesting.</p><p><strong>Highlights:</strong> seeing my DuckDuckGo work every time I search for a Python package, the work environment at GetUp<br><strong>Lowlights:</strong> chasing leads that never eventuated</p><h3 id=bandcamp-recommender-bcrecommender>Bandcamp Recommender (BCRecommender)<a hidden class=anchor aria-hidden=true href=#bandcamp-recommender-bcrecommender>#</a></h3><p>I&rsquo;ve written a several posts about <a href=http://www.bcrecommender.com target=_blank rel=noopener>BCRecommender, my Bandcamp music recommendation project</a>. While I&rsquo;ve always treated it as a side-project, it&rsquo;s been useful in learning how to get traction for a product. It now has thousands of monthly users, and is still growing. My goal for BCRecommender has changed from the original one of finding music for myself to growing it enough to be a noticeable source of traffic for Bandcamp, thereby helping artists and fans. Doing it in side-project mode can be a bit challenging at times (because I have so many other things to do and a long list of ideas to make the app better), but I&rsquo;ve been making gradual progress and discovering a lot of great music in the process.</p><p><strong>Highlights:</strong> every time someone gives me positive feedback, every time I listen to music I found using BCRecommender<br><strong>Lowlights:</strong> dealing with <a href=http://parse.com target=_blank rel=noopener>Parse</a> issues and random errors</p><h3 id=price-dingo>Price Dingo<a hidden class=anchor aria-hidden=true href=#price-dingo>#</a></h3><p>The inability to reliably compare prices for many types of products has been bothering me for a while. Unlike general web search, where the main providers rank results by relevance, most Australian price comparison engines still require merchants to pay to even have their products listed. This creates an obvious bias in the results. To address this bias, I created Price Dingo – a user-centric price comparison engine. It serves users with results they can trust by not requiring merchants to pay to have their products listed. Just like general web search engines, the main ranking factor is relevancy to the user. This relevancy is also achieved by implementing Price Dingo as a network of independent sites, each focused on a specific product category, with the first category being scuba diving gear.</p><p>Implementing Price Dingo hasn&rsquo;t been too hard – the main challenge has been finding the time to do it with all the other stuff I&rsquo;ve been doing. There are still plenty of improvements to be made to the site, but now the main goal is to get enough traction to make ongoing time investment worthwhile. Judging by the experience of <a href=http://www.booko.com.au target=_blank rel=noopener>Booko&rsquo;s</a> founder, there is space in the market for niche price comparison sites and apps, so it is just a matter of execution.</p><p><strong>Highlights:</strong> being able to finally compare dive gear prices, the joys of integrating <a href=http://www.algolia.com target=_blank rel=noopener>Algolia</a><br><strong>Lowlights:</strong> extracting data from messy websites – I&rsquo;ve seen some horrible things&mldr;</p><h3 id=marine-conservation>Marine conservation<a hidden class=anchor aria-hidden=true href=#marine-conservation>#</a></h3><p>The first thing I did after leaving my last job was go overseas for five weeks, which included a ten-day visit to Israel (rockets!) and three weeks of conservation diving with <a href=http://www.newheavendiveschool.com/marine-conservation-thailand/ target=_blank rel=noopener>New Heaven Dive School in Thailand</a>. Back in Sydney, I joined the <a href=http://www.urgdiveclub.org.au/ target=_blank rel=noopener>Underwater Research Group of NSW</a>, a dive club that&rsquo;s involved in many marine conservation and research activities, including <a href=http://reeflifesurvey.com/ target=_blank rel=noopener>Reef Life Survey (RLS)</a> and <a href=http://www.urgdiveclub.org.au/urg-and-rfa-clean-up-project/ target=_blank rel=noopener>underwater cleanups</a>. With URG, I&rsquo;ve been diving more than before, and for a change, some of my dives actually do good. I&rsquo;d love to do this kind of stuff full-time, but there&rsquo;s a lot less money in getting people to do less stuff (i.e., conservation and sustainability) than in consuming more. The compromise for now is that a portion of Price Dingo&rsquo;s scuba revenue goes to the <a href=http://www.marineconservation.org.au/ target=_blank rel=noopener>Australian Marine Conservation Society</a>, and the plan is to expand this to other charities as more categories are added. <strong>Update – May 2015:</strong> I decided that this compromise isn&rsquo;t good enough for me, so I shut down Price Dingo to focus on projects that are more aligned with my values.</p><p><strong>Highlights:</strong> <a href=http://www.urgdiveclub.org.au/reef-life-survey-training-review/ target=_blank rel=noopener>becoming a certified RLS diver</a>, pretty much every dive<br><strong>Lowlights:</strong> cutting my hand open by falling on rocks on the first day of diving in Thailand</p><h3 id=the-future>The future<a hidden class=anchor aria-hidden=true href=#the-future>#</a></h3><p>So far, I&rsquo;m pretty happy with this not-having-a-job-doing-my-own-thing business. According to <a href=http://www.tropicalmba.com/living-the-dream/ target=_blank rel=noopener>The 1000 Day Rule</a>, I still have a long way to go until I get the lifestyle I want. It may even take longer than 1000 days given my decision to not work full-time on a single profitable project, together with my tendency to take more time off than I would if I had a &ldquo;real&rdquo; job. But the beauty of this path is that there are no investors breathing down my neck or the feeling of mental rot that comes with a full-time job, so there&rsquo;s really no rush and I can just enjoy the ride.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The long road to a lifestyle business on x" href="https://x.com/intent/tweet/?text=The%20long%20road%20to%20a%20lifestyle%20business&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f03%2f22%2fthe-long-road-to-a-lifestyle-business%2f&amp;hashtags=business%2cdatascience%2cpersonal"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The long road to a lifestyle business on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f03%2f22%2fthe-long-road-to-a-lifestyle-business%2f&amp;title=The%20long%20road%20to%20a%20lifestyle%20business&amp;summary=The%20long%20road%20to%20a%20lifestyle%20business&amp;source=https%3a%2f%2fyanirseroussi.com%2f2015%2f03%2f22%2fthe-long-road-to-a-lifestyle-business%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The long road to a lifestyle business on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2015%2f03%2f22%2fthe-long-road-to-a-lifestyle-business%2f&title=The%20long%20road%20to%20a%20lifestyle%20business"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The long road to a lifestyle business on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2015%2f03%2f22%2fthe-long-road-to-a-lifestyle-business%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The long road to a lifestyle business on whatsapp" href="https://api.whatsapp.com/send?text=The%20long%20road%20to%20a%20lifestyle%20business%20-%20https%3a%2f%2fyanirseroussi.com%2f2015%2f03%2f22%2fthe-long-road-to-a-lifestyle-business%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The long road to a lifestyle business on telegram" href="https://telegram.me/share/url?text=The%20long%20road%20to%20a%20lifestyle%20business&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f03%2f22%2fthe-long-road-to-a-lifestyle-business%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The long road to a lifestyle business on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20long%20road%20to%20a%20lifestyle%20business&u=https%3a%2f%2fyanirseroussi.com%2f2015%2f03%2f22%2fthe-long-road-to-a-lifestyle-business%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="business,data science,personal"><meta name=description content="Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The long road to a lifestyle business"><meta property="og:description" content="Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/"><meta property="og:image" content="https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-03-22T09:43:47+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track.jpg"><meta name=twitter:title content="The long road to a lifestyle business"><meta name=twitter:description content="Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"The long road to a lifestyle business","item":"https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The long road to a lifestyle business","name":"The long road to a lifestyle business","description":"Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects.","keywords":["business","data science","personal"],"articleBody":"Almost a year ago, I left my last full-time job and decided to set on an independent path that includes data science consulting and work on my own projects. The ultimate goal is not to have to sell my time for money by generating enough passive income to live comfortably. My five main areas of focus are – in no particular order – personal branding \u0026 networking, data science contracting, Bandcamp Recommender, Price Dingo, and marine conservation. This post summarises what I’ve been doing in each of these five areas, including highlights and lowlights. So far, it’s way better than having a “real” job. I hope this post will help others who are on a similar journey (there seem to be more and more of us – I’d love to hear from you).\nPersonal branding \u0026 networking Finding clients requires considerably more work than finding a full-time job. As with job hunting, the ideal situation is where people come to you for help, rather than you chasing them. To this end, I’ve been networking a lot, giving talks, writing up posts and working on distributing them. It may be harder than getting a full-time job, but it’s also much more interesting.\nHighlights: going viral in China, getting a post featured in KDNuggets\nLowlights: not having enough time to write all the things and meet all the people\nData science contracting My goal with contracting/consulting is to have a steady income stream while working on my own projects. As my projects are small enough to be done only by me (with optional outsourcing to contractors), this means I have infinite runway to pursue them. While this is probably not the best way of building a Silicon Valley-style startup that is going to make the world a better place, many others have applied this approach to building a so-called lifestyle business, which is what I want to achieve.\nEarly on, I realised that doing full-on consulting would be too time consuming, as many clients expect full-time availability. In addition, constantly needing to find new clients means that not much time would be left for work on my own projects. What I really wanted was a stable part-time gig. The first one was with GetUp (who reached out to me following a workshop I gave at General Assembly), where I did some work on forecasting engagement and churn. In parallel, I went through the interview process at DuckDuckGo, which included delivering a piece of work to production. DuckDuckGo ended up wanting me to work full-time (like a few other companies), so last month I started a part-time (three days a week) contract at Commonwealth Bank. I joined a team of very strong data scientists – it looks like it’s going to be interesting.\nHighlights: seeing my DuckDuckGo work every time I search for a Python package, the work environment at GetUp\nLowlights: chasing leads that never eventuated\nBandcamp Recommender (BCRecommender) I’ve written a several posts about BCRecommender, my Bandcamp music recommendation project. While I’ve always treated it as a side-project, it’s been useful in learning how to get traction for a product. It now has thousands of monthly users, and is still growing. My goal for BCRecommender has changed from the original one of finding music for myself to growing it enough to be a noticeable source of traffic for Bandcamp, thereby helping artists and fans. Doing it in side-project mode can be a bit challenging at times (because I have so many other things to do and a long list of ideas to make the app better), but I’ve been making gradual progress and discovering a lot of great music in the process.\nHighlights: every time someone gives me positive feedback, every time I listen to music I found using BCRecommender\nLowlights: dealing with Parse issues and random errors\nPrice Dingo The inability to reliably compare prices for many types of products has been bothering me for a while. Unlike general web search, where the main providers rank results by relevance, most Australian price comparison engines still require merchants to pay to even have their products listed. This creates an obvious bias in the results. To address this bias, I created Price Dingo – a user-centric price comparison engine. It serves users with results they can trust by not requiring merchants to pay to have their products listed. Just like general web search engines, the main ranking factor is relevancy to the user. This relevancy is also achieved by implementing Price Dingo as a network of independent sites, each focused on a specific product category, with the first category being scuba diving gear.\nImplementing Price Dingo hasn’t been too hard – the main challenge has been finding the time to do it with all the other stuff I’ve been doing. There are still plenty of improvements to be made to the site, but now the main goal is to get enough traction to make ongoing time investment worthwhile. Judging by the experience of Booko’s founder, there is space in the market for niche price comparison sites and apps, so it is just a matter of execution.\nHighlights: being able to finally compare dive gear prices, the joys of integrating Algolia\nLowlights: extracting data from messy websites – I’ve seen some horrible things…\nMarine conservation The first thing I did after leaving my last job was go overseas for five weeks, which included a ten-day visit to Israel (rockets!) and three weeks of conservation diving with New Heaven Dive School in Thailand. Back in Sydney, I joined the Underwater Research Group of NSW, a dive club that’s involved in many marine conservation and research activities, including Reef Life Survey (RLS) and underwater cleanups. With URG, I’ve been diving more than before, and for a change, some of my dives actually do good. I’d love to do this kind of stuff full-time, but there’s a lot less money in getting people to do less stuff (i.e., conservation and sustainability) than in consuming more. The compromise for now is that a portion of Price Dingo’s scuba revenue goes to the Australian Marine Conservation Society, and the plan is to expand this to other charities as more categories are added. Update – May 2015: I decided that this compromise isn’t good enough for me, so I shut down Price Dingo to focus on projects that are more aligned with my values.\nHighlights: becoming a certified RLS diver, pretty much every dive\nLowlights: cutting my hand open by falling on rocks on the first day of diving in Thailand\nThe future So far, I’m pretty happy with this not-having-a-job-doing-my-own-thing business. According to The 1000 Day Rule, I still have a long way to go until I get the lifestyle I want. It may even take longer than 1000 days given my decision to not work full-time on a single profitable project, together with my tendency to take more time off than I would if I had a “real” job. But the beauty of this path is that there are no investors breathing down my neck or the feeling of mental rot that comes with a full-time job, so there’s really no rush and I can just enjoy the ride.\n","wordCount":"1202","inLanguage":"en","image":"https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track.jpg","datePublished":"2015-03-22T09:43:47Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">The long road to a lifestyle business</h1><div class=post-meta><span title='2015-03-22 09:43:47 +0000 UTC'>March 22, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track_hu48d06ef732b295416c5a71b75238e67b_1361225_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track_hu48d06ef732b295416c5a71b75238e67b_1361225_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track_hu48d06ef732b295416c5a71b75238e67b_1361225_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track_hu48d06ef732b295416c5a71b75238e67b_1361225_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track_hu48d06ef732b295416c5a71b75238e67b_1361225_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track.jpg 3450w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/overland-track.jpg alt width=3450 height=1730></figure><div class=post-content><p>Almost a year ago, I left my last full-time job and decided to set on an independent path that includes data science consulting and work on my own projects. The ultimate goal is not to <em>have</em> to sell my time for money by generating enough passive income to live comfortably. My five main areas of focus are – in no particular order – personal branding & networking, data science contracting, <a href=http://www.bcrecommender.com target=_blank rel=noopener>Bandcamp Recommender</a>, Price Dingo, and marine conservation. This post summarises what I&rsquo;ve been doing in each of these five areas, including highlights and lowlights. So far, it&rsquo;s way better than having a &ldquo;real&rdquo; job. I hope this post will help others who are on a similar journey (there seem to be more and more of us – I&rsquo;d love to hear from you).</p><h3 id=personal-branding--networking>Personal branding & networking<a hidden class=anchor aria-hidden=true href=#personal-branding--networking>#</a></h3><p>Finding clients requires considerably more work than finding a full-time job. As with job hunting, the ideal situation is where people come to you for help, rather than you chasing them. To this end, I&rsquo;ve been networking a lot, giving talks, writing up posts and working on distributing them. It may be harder than getting a full-time job, but it&rsquo;s also much more interesting.</p><p><strong>Highlights:</strong> <a href=http://www.weibo.com/1497035431/BDl53rXDk target=_blank rel=noopener>going viral in China</a>, <a href=http://www.kdnuggets.com/2015/03/10-steps-success-kaggle-data-science-competitions.html target=_blank rel=noopener>getting a post featured in KDNuggets</a><br><strong>Lowlights:</strong> not having enough time to write all the things and meet all the people</p><h3 id=data-science-contracting>Data science contracting<a hidden class=anchor aria-hidden=true href=#data-science-contracting>#</a></h3><p>My goal with contracting/consulting is to have a steady income stream while working on my own projects. As my projects are small enough to be done only by me (with optional outsourcing to contractors), this means I have infinite runway to pursue them. While this is probably not the best way of building a Silicon Valley-style startup that is going to <a href="https://www.youtube.com/watch?v=J-GVd_HLlps" target=_blank rel=noopener>make the world a better place</a>, many others have applied this approach to building a so-called lifestyle business, which is what I want to achieve.</p><p>Early on, I realised that doing full-on consulting would be too time consuming, as many clients expect full-time availability. In addition, constantly needing to find new clients means that not much time would be left for work on my own projects. What I really wanted was a stable part-time gig. The first one was with <a href=https://www.getup.org.au/ target=_blank rel=noopener>GetUp</a> (who reached out to me following a workshop I gave at <a href=https://generalassemb.ly/education/demystifying-data-an-introduction-to-data-science target=_blank rel=noopener>General Assembly</a>), where I did some work on forecasting engagement and churn. In parallel, I went through the interview process at <a href=https://duckduckgo.com/ target=_blank rel=noopener>DuckDuckGo</a>, which included <a href=https://github.com/duckduckgo/zeroclickinfo-fathead/pull/95 target=_blank rel=noopener>delivering a piece of work to production</a>. DuckDuckGo ended up wanting me to work full-time (like a few other companies), so last month I started a part-time (three days a week) contract at <a href=https://www.commbank.com.au/ target=_blank rel=noopener>Commonwealth Bank</a>. I joined a team of very strong data scientists – it looks like it&rsquo;s going to be interesting.</p><p><strong>Highlights:</strong> seeing my DuckDuckGo work every time I search for a Python package, the work environment at GetUp<br><strong>Lowlights:</strong> chasing leads that never eventuated</p><h3 id=bandcamp-recommender-bcrecommender>Bandcamp Recommender (BCRecommender)<a hidden class=anchor aria-hidden=true href=#bandcamp-recommender-bcrecommender>#</a></h3><p>I&rsquo;ve written a several posts about <a href=http://www.bcrecommender.com target=_blank rel=noopener>BCRecommender, my Bandcamp music recommendation project</a>. While I&rsquo;ve always treated it as a side-project, it&rsquo;s been useful in learning how to get traction for a product. It now has thousands of monthly users, and is still growing. My goal for BCRecommender has changed from the original one of finding music for myself to growing it enough to be a noticeable source of traffic for Bandcamp, thereby helping artists and fans. Doing it in side-project mode can be a bit challenging at times (because I have so many other things to do and a long list of ideas to make the app better), but I&rsquo;ve been making gradual progress and discovering a lot of great music in the process.</p><p><strong>Highlights:</strong> every time someone gives me positive feedback, every time I listen to music I found using BCRecommender<br><strong>Lowlights:</strong> dealing with <a href=http://parse.com target=_blank rel=noopener>Parse</a> issues and random errors</p><h3 id=price-dingo>Price Dingo<a hidden class=anchor aria-hidden=true href=#price-dingo>#</a></h3><p>The inability to reliably compare prices for many types of products has been bothering me for a while. Unlike general web search, where the main providers rank results by relevance, most Australian price comparison engines still require merchants to pay to even have their products listed. This creates an obvious bias in the results. To address this bias, I created Price Dingo – a user-centric price comparison engine. It serves users with results they can trust by not requiring merchants to pay to have their products listed. Just like general web search engines, the main ranking factor is relevancy to the user. This relevancy is also achieved by implementing Price Dingo as a network of independent sites, each focused on a specific product category, with the first category being scuba diving gear.</p><p>Implementing Price Dingo hasn&rsquo;t been too hard – the main challenge has been finding the time to do it with all the other stuff I&rsquo;ve been doing. There are still plenty of improvements to be made to the site, but now the main goal is to get enough traction to make ongoing time investment worthwhile. Judging by the experience of <a href=http://www.booko.com.au target=_blank rel=noopener>Booko&rsquo;s</a> founder, there is space in the market for niche price comparison sites and apps, so it is just a matter of execution.</p><p><strong>Highlights:</strong> being able to finally compare dive gear prices, the joys of integrating <a href=http://www.algolia.com target=_blank rel=noopener>Algolia</a><br><strong>Lowlights:</strong> extracting data from messy websites – I&rsquo;ve seen some horrible things&mldr;</p><h3 id=marine-conservation>Marine conservation<a hidden class=anchor aria-hidden=true href=#marine-conservation>#</a></h3><p>The first thing I did after leaving my last job was go overseas for five weeks, which included a ten-day visit to Israel (rockets!) and three weeks of conservation diving with <a href=http://www.newheavendiveschool.com/marine-conservation-thailand/ target=_blank rel=noopener>New Heaven Dive School in Thailand</a>. Back in Sydney, I joined the <a href=http://www.urgdiveclub.org.au/ target=_blank rel=noopener>Underwater Research Group of NSW</a>, a dive club that&rsquo;s involved in many marine conservation and research activities, including <a href=http://reeflifesurvey.com/ target=_blank rel=noopener>Reef Life Survey (RLS)</a> and <a href=http://www.urgdiveclub.org.au/urg-and-rfa-clean-up-project/ target=_blank rel=noopener>underwater cleanups</a>. With URG, I&rsquo;ve been diving more than before, and for a change, some of my dives actually do good. I&rsquo;d love to do this kind of stuff full-time, but there&rsquo;s a lot less money in getting people to do less stuff (i.e., conservation and sustainability) than in consuming more. The compromise for now is that a portion of Price Dingo&rsquo;s scuba revenue goes to the <a href=http://www.marineconservation.org.au/ target=_blank rel=noopener>Australian Marine Conservation Society</a>, and the plan is to expand this to other charities as more categories are added. <strong>Update – May 2015:</strong> I decided that this compromise isn&rsquo;t good enough for me, so I shut down Price Dingo to focus on projects that are more aligned with my values.</p><p><strong>Highlights:</strong> <a href=http://www.urgdiveclub.org.au/reef-life-survey-training-review/ target=_blank rel=noopener>becoming a certified RLS diver</a>, pretty much every dive<br><strong>Lowlights:</strong> cutting my hand open by falling on rocks on the first day of diving in Thailand</p><h3 id=the-future>The future<a hidden class=anchor aria-hidden=true href=#the-future>#</a></h3><p>So far, I&rsquo;m pretty happy with this not-having-a-job-doing-my-own-thing business. According to <a href=http://www.tropicalmba.com/living-the-dream/ target=_blank rel=noopener>The 1000 Day Rule</a>, I still have a long way to go until I get the lifestyle I want. It may even take longer than 1000 days given my decision to not work full-time on a single profitable project, together with my tendency to take more time off than I would if I had a &ldquo;real&rdquo; job. But the beauty of this path is that there are no investors breathing down my neck or the feeling of mental rot that comes with a full-time job, so there&rsquo;s really no rush and I can just enjoy the ride.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The long road to a lifestyle business on x" href="https://x.com/intent/tweet/?text=The%20long%20road%20to%20a%20lifestyle%20business&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f03%2f22%2fthe-long-road-to-a-lifestyle-business%2f&amp;hashtags=business%2cdatascience%2cpersonal"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The long road to a lifestyle business on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f03%2f22%2fthe-long-road-to-a-lifestyle-business%2f&amp;title=The%20long%20road%20to%20a%20lifestyle%20business&amp;summary=The%20long%20road%20to%20a%20lifestyle%20business&amp;source=https%3a%2f%2fyanirseroussi.com%2f2015%2f03%2f22%2fthe-long-road-to-a-lifestyle-business%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The long road to a lifestyle business on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2015%2f03%2f22%2fthe-long-road-to-a-lifestyle-business%2f&title=The%20long%20road%20to%20a%20lifestyle%20business"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The long road to a lifestyle business on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2015%2f03%2f22%2fthe-long-road-to-a-lifestyle-business%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The long road to a lifestyle business on whatsapp" href="https://api.whatsapp.com/send?text=The%20long%20road%20to%20a%20lifestyle%20business%20-%20https%3a%2f%2fyanirseroussi.com%2f2015%2f03%2f22%2fthe-long-road-to-a-lifestyle-business%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The long road to a lifestyle business on telegram" href="https://telegram.me/share/url?text=The%20long%20road%20to%20a%20lifestyle%20business&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f03%2f22%2fthe-long-road-to-a-lifestyle-business%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The long road to a lifestyle business on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20long%20road%20to%20a%20lifestyle%20business&u=https%3a%2f%2fyanirseroussi.com%2f2015%2f03%2f22%2fthe-long-road-to-a-lifestyle-business%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2015/04/24/my-divestment-from-fossil-fuels/index.html b/2015/04/24/my-divestment-from-fossil-fuels/index.html
index eacdd8f01..e29ed4ad6 100644
--- a/2015/04/24/my-divestment-from-fossil-fuels/index.html
+++ b/2015/04/24/my-divestment-from-fossil-fuels/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>My divestment from fossil fuels | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="climate change,divestment,environment,fossil fuels"><meta name=description content="Recent choices I&rsquo;ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="My divestment from fossil fuels"><meta property="og:description" content="Recent choices I&rsquo;ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/"><meta property="og:image" content="https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-04-24T00:19:36+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry.jpg"><meta name=twitter:title content="My divestment from fossil fuels"><meta name=twitter:description content="Recent choices I&rsquo;ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"My divestment from fossil fuels","item":"https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"My divestment from fossil fuels","name":"My divestment from fossil fuels","description":"Recent choices I\u0026rsquo;ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons.","keywords":["climate change","divestment","environment","fossil fuels"],"articleBody":" This post covers recent choices I've made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons. I recently read Naomi Klein’s This Changes Everything, which deeply influenced me. The book describes how the world has been dragging its feet when it comes to reducing carbon emissions, and how we are coming very close to a point where climate change is likely to spin out of control. While many of the facts presented in the book can be very depressing, one ray of light is that it is still not too late to act. There are still things we can do to avoid catastrophic climate change.\nOne such thing is divestment from fossil fuels. Fossil fuel companies have committed to extracting (and therefore burning) more than what scientists agree is the safe amount of carbon that can be pumped into the atmosphere. While governments have been rather ineffective in stopping this (the current Australian government is even embarrassingly rolling back emission-reduction measures), divesting your money from such companies can help take away the social licence of these companies to do as they please. Further, this may be a smart investment strategy because the world is moving towards renewable energy. Indeed, according to one index, investors who divested from fossil fuels have had higher returns than conventional investors over the last five years.\nIt’s worth noting that even if you disagree with the scientific consensus that releasing billions of tonnes of greenhouse gases into the atmosphere increases the likelihood of climate change, you should agree that it’d be better to stop breathing all the pollutants that result from burning fossil fuels. Further, the environmental damage that comes with extracting fossil fuels is something worth avoiding. Examples include the Deepwater Horizon oil spill, numerous cases of poisoned water due to fracking, and the potential damage to the Great Barrier Reef due to coal mine expansion. Even climate change deniers would admit that divestment from fossil fuels and a rapid move to clean renewables will prevent such disasters.\nThe rest of this post describes steps I’ve recently taken towards divesting from fossil fuels. These are mostly relevant to Australians, though other countries may have similar options.\nSuperannuation In Australia, we have compulsory superannuation (commonly known as super), meaning that most working Australians have some money invested somewhere. As this money is only available at retirement, investors can afford to optimise for long-term returns. Many super funds allow investors to choose what to invest in, and switching funds is relatively straightforward. My super fund is UniSuper. Last week, I switched my plan from Balanced, which includes investments in coal miners Rio Tinto and BHP Billiton, to 75% Sustainable Balanced, which doesn’t directly invest in fossil fuels, and 25% Global Environment Opportunities, which is focused on companies with a green agenda such as Tesla. This switch was very simple – I wish I had done it earlier. If you’re interested in making a similar switch, check out Superswitch’s guide to fossil-free super options.\nEnergy While our previous energy retailer (ClickEnergy) isn’t one of the big three retailers who are actively lobbying the government to reduce the renewable energy target for 2020, my partner and I decided to switch to Powershop, as it appears to be the greenest energy retailer in New South Wales. Powershop supports maintaining the renewable energy target in its current form and provides free carbon offsets for all non-renewable energy. In addition, Powershop allows customers to purchase 100% green power from renewables – an option that we choose to take. With the savings from moving to Powershop and the extra payment for green power, our bill is expected to be more or less the same as before. Everyone wins!\nNote: If you live in New South Wales or Victoria and generally support what GetUp is doing, you can sign up via the links on this page, and GetUp will be paid a referral fee by Powershop.\nBanking There’s been a lot of focus recently on financing provided by the major banks to fossil fuel companies. The problem is that – unlike with super and energy – there aren’t many viable alternatives to the big banks. Reading the statements by smaller banks and credit unions, it is clear that they don’t provide financing to polluters just because they’re too small or not focused on commercial lending. Further, some of the smaller banks invest their money with the bigger banks. If the smaller banks were to become big due to the divestment movement, they may end up financing polluters. Unfortunately, changing your bank doesn’t give you more control over how your chosen financial institute uses your money.\nFor now, I think it makes sense to push the banks to become fossil free by putting them on notice or participating in demonstrations. With enough pressure, one of the big banks may make a strong statement against lending to polluters, and then it’ll be time to act on the notices. One thing that the big banks care about is customer satisfaction and public image. Sending a strong message about the connection between financing polluters and satisfaction may be enough to make a difference. I’ll be tracking news in this area and will possibly make a switch in the future, depending on how things evolve.\nTransportation My top transportation choices are cycling and public transport, followed by driving when the former two are highly inconvenient (e.g., when going scuba diving). Every bike ride means less pollution and is a vote against fossil fuels. Further, bike riding is my main form of exercise, so I don’t need to set aside time to go to the gym. Finally, it’s almost free, and it’s also the fastest way of getting to the city from where I live.\nSince January, I’ve been allowing people to borrow my car through Car Next Door. This service, which is currently active in Sydney and Melbourne, allows people to hire their neighbours’ cars, thereby reducing the number of cars on the road. They also carbon offset all the rides taken through the service. While making my car available has made using it slightly less convenient (because I need to book it for myself), it’s also saved me money, so far covering the cost of insurance and roadside assistance. With my car sitting idle for 95% of the time before joining Car Next Door, it’s definitely another win-win situation. If you’d like to join Car Next Door as either a borrower or an owner, you can use this link to get $15 credit.\nOther areas and next steps Many of the choices we make every day have the power to reduce energy demand. These choices often make our life better, as seen with the bike riding example above. There’s a lot of material online about these green choices, which I may cover from my angle in another post. In general, I’m planning to be more active in the area of environmentalism. While this may come at the cost of reduced focus on my other activities, I would rather be more a part of the solution than a part of the problem. I’ll update as I go – please subscribe to get notified when updates occur.\n","wordCount":"1209","inLanguage":"en","image":"https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry.jpg","datePublished":"2015-04-24T00:19:36Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">My divestment from fossil fuels</h1><div class=post-meta><span title='2015-04-24 00:19:36 +0000 UTC'>April 24, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry_hu578451d39f2ee65bac6accbf307997d3_141026_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry_hu578451d39f2ee65bac6accbf307997d3_141026_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry_hu578451d39f2ee65bac6accbf307997d3_141026_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry_hu578451d39f2ee65bac6accbf307997d3_141026_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry.jpg 1280w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry.jpg alt width=1280 height=653></figure><div class=post-content><p class=intro-note>This post covers recent choices I've made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons.</p><p>I recently read <a href=http://thischangeseverything.org/ target=_blank rel=noopener>Naomi Klein&rsquo;s This Changes Everything</a>, which deeply influenced me. The book describes how the world has been dragging its feet when it comes to reducing carbon emissions, and how we are coming very close to a point where climate change is likely to spin out of control. While many of the facts presented in the book can be very depressing, one ray of light is that it is still not too late to act. There are still things we can do to avoid catastrophic climate change.</p><p>One such thing is <a href=http://gofossilfree.org/ target=_blank rel=noopener>divestment from fossil fuels</a>. Fossil fuel companies have committed to extracting (and therefore burning) <a href=https://theconversation.com/unburnable-carbon-why-we-need-to-leave-fossil-fuels-in-the-ground-40467 target=_blank rel=noopener>more than what scientists agree is the safe amount of carbon that can be pumped into the atmosphere</a>. While governments have been rather ineffective in stopping this (the current Australian government is even <a href=https://www.facebook.com/theprojecttv/videos/10152808607343441/ target=_blank rel=noopener>embarrassingly rolling back emission-reduction measures</a>), divesting your money from such companies can help take away the social licence of these companies to do as they please. Further, this may be a smart investment strategy because the world is moving towards renewable energy. Indeed, according to one index, <a href=http://www.theguardian.com/environment/2015/apr/10/fossil-fuel-free-funds-out-performed-conventional-ones-analysis-shows target=_blank rel=noopener>investors who divested from fossil fuels have had higher returns than conventional investors over the last five years</a>.</p><p>It&rsquo;s worth noting that even if you disagree with the scientific consensus that releasing <a href=https://en.wikipedia.org/wiki/Greenhouse_gas target=_blank rel=noopener>billions of tonnes of greenhouse gases</a> into the atmosphere increases the likelihood of climate change, you should agree that it&rsquo;d be better to stop breathing all the pollutants that result from burning fossil fuels. Further, the environmental damage that comes with extracting fossil fuels is something worth avoiding. Examples include <a href=https://en.wikipedia.org/wiki/Deepwater_Horizon_oil_spill target=_blank rel=noopener>the Deepwater Horizon oil spill</a>, <a href=https://en.wikipedia.org/wiki/Environmental_impact_of_hydraulic_fracturing target=_blank rel=noopener>numerous cases of poisoned water due to fracking</a>, and <a href=http://fightforthereef.org.au/ target=_blank rel=noopener>the potential damage to the Great Barrier Reef due to coal mine expansion</a>. Even climate change deniers would admit that divestment from fossil fuels and a rapid move to clean renewables will prevent such disasters.</p><p>The rest of this post describes steps I&rsquo;ve recently taken towards divesting from fossil fuels. These are mostly relevant to Australians, though other countries may have similar options.</p><h3 id=superannuation>Superannuation<a hidden class=anchor aria-hidden=true href=#superannuation>#</a></h3><p>In Australia, we have <a href=https://en.wikipedia.org/wiki/Superannuation_in_Australia target=_blank rel=noopener>compulsory superannuation</a> (commonly known as <em>super</em>), meaning that most working Australians have some money invested somewhere. As this money is only available at retirement, investors can afford to optimise for long-term returns. Many super funds allow investors to choose what to invest in, and switching funds is relatively straightforward. My super fund is <a href=http://www.unisuper.com.au/ target=_blank rel=noopener>UniSuper</a>. Last week, I switched my plan from <a href=http://www.unisuper.com.au/investments/investment-options-and-performance/super-performance-and-option-holdings/balanced target=_blank rel=noopener>Balanced</a>, which includes investments in coal miners Rio Tinto and BHP Billiton, to 75% <a href=http://www.unisuper.com.au/investments/investment-options-and-performance/super-performance-and-option-holdings/sustainable-balanced target=_blank rel=noopener>Sustainable Balanced</a>, which doesn&rsquo;t directly invest in fossil fuels, and 25% <a href=http://www.unisuper.com.au/investments/investment-options-and-performance/super-performance-and-option-holdings/global-environmental-opportunities target=_blank rel=noopener>Global Environment Opportunities</a>, which is focused on companies with a green agenda such as Tesla. This switch was very simple – I wish I had done it earlier. If you&rsquo;re interested in making a similar switch, check out <a href=http://superswitch.org.au/ target=_blank rel=noopener>Superswitch&rsquo;s guide to fossil-free super options</a>.</p><h3 id=energy>Energy<a hidden class=anchor aria-hidden=true href=#energy>#</a></h3><p>While our previous energy retailer (ClickEnergy) isn&rsquo;t one of the big three retailers <a href=https://www.getup.org.au/campaigns/renewable-energy/send-the-dirty-three-a-message/hit-the-dirty-three-where-it-hurts target=_blank rel=noopener>who are actively lobbying the government to reduce the renewable energy target for 2020</a>, my partner and I decided to switch to <a href=http://www.powershop.com.au/ target=_blank rel=noopener>Powershop</a>, as it appears to be the greenest energy retailer in New South Wales. Powershop <a href=http://www.powershop.com.au/renewables/ target=_blank rel=noopener>supports maintaining the renewable energy target in its current form</a> and provides free carbon offsets for all non-renewable energy. In addition, Powershop allows customers to purchase 100% green power from renewables – an option that we choose to take. With the savings from moving to Powershop and the extra payment for green power, our bill is expected to be more or less the same as before. Everyone wins!</p><p>Note: If you live in New South Wales or Victoria and generally support what GetUp is doing, you can sign up via <a href=https://www.getup.org.au/campaigns/renewable-energy/send-the-dirty-three-a-message/hit-the-dirty-three-where-it-hurts target=_blank rel=noopener>the links on this page</a>, and GetUp will be paid a referral fee by Powershop.</p><h3 id=banking>Banking<a hidden class=anchor aria-hidden=true href=#banking>#</a></h3><p>There&rsquo;s been a lot of focus recently on <a href=http://gofossilfree.org.au/fossil-free-banks/ target=_blank rel=noopener>financing provided by the major banks to fossil fuel companies</a>. The problem is that – unlike with super and energy – there aren&rsquo;t many viable alternatives to the big banks. Reading the <a href=http://www.marketforces.org.au/banks/compare target=_blank rel=noopener>statements by smaller banks and credit unions</a>, it is clear that they don&rsquo;t provide financing to polluters just because they&rsquo;re too small or not focused on commercial lending. Further, some of the smaller banks invest their money with the bigger banks. If the smaller banks were to become big due to the divestment movement, they may end up financing polluters. Unfortunately, changing your bank doesn&rsquo;t give you more control over how your chosen financial institute uses your money.</p><p>For now, I think it makes sense to push the banks to become fossil free by <a href=http://action.marketforces.org.au/page/s/banks-on-notice target=_blank rel=noopener>putting them on notice</a> or <a href=http://act.350.org/event/CBA_Week_of_Action/ target=_blank rel=noopener>participating in demonstrations</a>. With enough pressure, one of the big banks may make a strong statement against lending to polluters, and then it&rsquo;ll be time to act on the notices. One thing that the big banks care about is <a href=http://www.roymorgan.com/findings/6028-consumer-sat-with-banks-close-to-record-high-201501262213 target=_blank rel=noopener>customer satisfaction</a> and public image. Sending a strong message about the connection between financing polluters and satisfaction may be enough to make a difference. I&rsquo;ll be tracking news in this area and will possibly make a switch in the future, depending on how things evolve.</p><h3 id=transportation>Transportation<a hidden class=anchor aria-hidden=true href=#transportation>#</a></h3><p>My top transportation choices are cycling and public transport, followed by driving when the former two are highly inconvenient (e.g., when going scuba diving). Every bike ride means less pollution and is a vote against fossil fuels. Further, bike riding is my main form of exercise, so I don&rsquo;t need to set aside time to go to the gym. Finally, it&rsquo;s almost free, and it&rsquo;s also the fastest way of getting to the city from where I live.</p><p>Since January, I&rsquo;ve been allowing people to borrow my car through Car Next Door. This service, which is currently active in Sydney and Melbourne, allows people to hire their neighbours&rsquo; cars, thereby reducing the number of cars on the road. They also <a href=http://www.carnextdoor.com.au/carbon-offset/ target=_blank rel=noopener>carbon offset all the rides taken through the service</a>. While making my car available has made using it slightly less convenient (because I need to book it for myself), it&rsquo;s also saved me money, so far covering the cost of insurance and roadside assistance. With my car sitting idle for 95% of the time before joining Car Next Door, it&rsquo;s definitely another win-win situation. If you&rsquo;d like to join Car Next Door as either a borrower or an owner, you can <a href="http://carnextdoor.ontraport.net/t?orid=26287&opid=2" target=_blank rel=noopener>use this link to get $15 credit</a>.</p><h3 id=other-areas-and-next-steps>Other areas and next steps<a hidden class=anchor aria-hidden=true href=#other-areas-and-next-steps>#</a></h3><p>Many of the choices we make every day have the power to reduce energy demand. These choices often make our life better, as seen with the bike riding example above. There&rsquo;s a lot of material online about these green choices, which I may cover from my angle in another post. In general, I&rsquo;m planning to be more active in the area of environmentalism. While this may come at the cost of reduced focus on <a href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/ title="The long road to a lifestyle business">my other activities</a>, I would rather be more a part of the solution than a part of the problem. I&rsquo;ll update as I go – please subscribe to get notified when updates occur.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/climate-change/>Climate Change</a></li><li><a href=https://yanirseroussi.com/tags/divestment/>Divestment</a></li><li><a href=https://yanirseroussi.com/tags/environment/>Environment</a></li><li><a href=https://yanirseroussi.com/tags/fossil-fuels/>Fossil Fuels</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share My divestment from fossil fuels on x" href="https://x.com/intent/tweet/?text=My%20divestment%20from%20fossil%20fuels&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f04%2f24%2fmy-divestment-from-fossil-fuels%2f&amp;hashtags=climatechange%2cdivestment%2cenvironment%2cfossilfuels"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My divestment from fossil fuels on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f04%2f24%2fmy-divestment-from-fossil-fuels%2f&amp;title=My%20divestment%20from%20fossil%20fuels&amp;summary=My%20divestment%20from%20fossil%20fuels&amp;source=https%3a%2f%2fyanirseroussi.com%2f2015%2f04%2f24%2fmy-divestment-from-fossil-fuels%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My divestment from fossil fuels on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2015%2f04%2f24%2fmy-divestment-from-fossil-fuels%2f&title=My%20divestment%20from%20fossil%20fuels"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My divestment from fossil fuels on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2015%2f04%2f24%2fmy-divestment-from-fossil-fuels%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My divestment from fossil fuels on whatsapp" href="https://api.whatsapp.com/send?text=My%20divestment%20from%20fossil%20fuels%20-%20https%3a%2f%2fyanirseroussi.com%2f2015%2f04%2f24%2fmy-divestment-from-fossil-fuels%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My divestment from fossil fuels on telegram" href="https://telegram.me/share/url?text=My%20divestment%20from%20fossil%20fuels&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f04%2f24%2fmy-divestment-from-fossil-fuels%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My divestment from fossil fuels on ycombinator" href="https://news.ycombinator.com/submitlink?t=My%20divestment%20from%20fossil%20fuels&u=https%3a%2f%2fyanirseroussi.com%2f2015%2f04%2f24%2fmy-divestment-from-fossil-fuels%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="climate change,divestment,environment,fossil fuels"><meta name=description content="Recent choices I&rsquo;ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="My divestment from fossil fuels"><meta property="og:description" content="Recent choices I&rsquo;ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/"><meta property="og:image" content="https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-04-24T00:19:36+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry.jpg"><meta name=twitter:title content="My divestment from fossil fuels"><meta name=twitter:description content="Recent choices I&rsquo;ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"My divestment from fossil fuels","item":"https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"My divestment from fossil fuels","name":"My divestment from fossil fuels","description":"Recent choices I\u0026rsquo;ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons.","keywords":["climate change","divestment","environment","fossil fuels"],"articleBody":" This post covers recent choices I've made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons. I recently read Naomi Klein’s This Changes Everything, which deeply influenced me. The book describes how the world has been dragging its feet when it comes to reducing carbon emissions, and how we are coming very close to a point where climate change is likely to spin out of control. While many of the facts presented in the book can be very depressing, one ray of light is that it is still not too late to act. There are still things we can do to avoid catastrophic climate change.\nOne such thing is divestment from fossil fuels. Fossil fuel companies have committed to extracting (and therefore burning) more than what scientists agree is the safe amount of carbon that can be pumped into the atmosphere. While governments have been rather ineffective in stopping this (the current Australian government is even embarrassingly rolling back emission-reduction measures), divesting your money from such companies can help take away the social licence of these companies to do as they please. Further, this may be a smart investment strategy because the world is moving towards renewable energy. Indeed, according to one index, investors who divested from fossil fuels have had higher returns than conventional investors over the last five years.\nIt’s worth noting that even if you disagree with the scientific consensus that releasing billions of tonnes of greenhouse gases into the atmosphere increases the likelihood of climate change, you should agree that it’d be better to stop breathing all the pollutants that result from burning fossil fuels. Further, the environmental damage that comes with extracting fossil fuels is something worth avoiding. Examples include the Deepwater Horizon oil spill, numerous cases of poisoned water due to fracking, and the potential damage to the Great Barrier Reef due to coal mine expansion. Even climate change deniers would admit that divestment from fossil fuels and a rapid move to clean renewables will prevent such disasters.\nThe rest of this post describes steps I’ve recently taken towards divesting from fossil fuels. These are mostly relevant to Australians, though other countries may have similar options.\nSuperannuation In Australia, we have compulsory superannuation (commonly known as super), meaning that most working Australians have some money invested somewhere. As this money is only available at retirement, investors can afford to optimise for long-term returns. Many super funds allow investors to choose what to invest in, and switching funds is relatively straightforward. My super fund is UniSuper. Last week, I switched my plan from Balanced, which includes investments in coal miners Rio Tinto and BHP Billiton, to 75% Sustainable Balanced, which doesn’t directly invest in fossil fuels, and 25% Global Environment Opportunities, which is focused on companies with a green agenda such as Tesla. This switch was very simple – I wish I had done it earlier. If you’re interested in making a similar switch, check out Superswitch’s guide to fossil-free super options.\nEnergy While our previous energy retailer (ClickEnergy) isn’t one of the big three retailers who are actively lobbying the government to reduce the renewable energy target for 2020, my partner and I decided to switch to Powershop, as it appears to be the greenest energy retailer in New South Wales. Powershop supports maintaining the renewable energy target in its current form and provides free carbon offsets for all non-renewable energy. In addition, Powershop allows customers to purchase 100% green power from renewables – an option that we choose to take. With the savings from moving to Powershop and the extra payment for green power, our bill is expected to be more or less the same as before. Everyone wins!\nNote: If you live in New South Wales or Victoria and generally support what GetUp is doing, you can sign up via the links on this page, and GetUp will be paid a referral fee by Powershop.\nBanking There’s been a lot of focus recently on financing provided by the major banks to fossil fuel companies. The problem is that – unlike with super and energy – there aren’t many viable alternatives to the big banks. Reading the statements by smaller banks and credit unions, it is clear that they don’t provide financing to polluters just because they’re too small or not focused on commercial lending. Further, some of the smaller banks invest their money with the bigger banks. If the smaller banks were to become big due to the divestment movement, they may end up financing polluters. Unfortunately, changing your bank doesn’t give you more control over how your chosen financial institute uses your money.\nFor now, I think it makes sense to push the banks to become fossil free by putting them on notice or participating in demonstrations. With enough pressure, one of the big banks may make a strong statement against lending to polluters, and then it’ll be time to act on the notices. One thing that the big banks care about is customer satisfaction and public image. Sending a strong message about the connection between financing polluters and satisfaction may be enough to make a difference. I’ll be tracking news in this area and will possibly make a switch in the future, depending on how things evolve.\nTransportation My top transportation choices are cycling and public transport, followed by driving when the former two are highly inconvenient (e.g., when going scuba diving). Every bike ride means less pollution and is a vote against fossil fuels. Further, bike riding is my main form of exercise, so I don’t need to set aside time to go to the gym. Finally, it’s almost free, and it’s also the fastest way of getting to the city from where I live.\nSince January, I’ve been allowing people to borrow my car through Car Next Door. This service, which is currently active in Sydney and Melbourne, allows people to hire their neighbours’ cars, thereby reducing the number of cars on the road. They also carbon offset all the rides taken through the service. While making my car available has made using it slightly less convenient (because I need to book it for myself), it’s also saved me money, so far covering the cost of insurance and roadside assistance. With my car sitting idle for 95% of the time before joining Car Next Door, it’s definitely another win-win situation. If you’d like to join Car Next Door as either a borrower or an owner, you can use this link to get $15 credit.\nOther areas and next steps Many of the choices we make every day have the power to reduce energy demand. These choices often make our life better, as seen with the bike riding example above. There’s a lot of material online about these green choices, which I may cover from my angle in another post. In general, I’m planning to be more active in the area of environmentalism. While this may come at the cost of reduced focus on my other activities, I would rather be more a part of the solution than a part of the problem. I’ll update as I go – please subscribe to get notified when updates occur.\n","wordCount":"1209","inLanguage":"en","image":"https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry.jpg","datePublished":"2015-04-24T00:19:36Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">My divestment from fossil fuels</h1><div class=post-meta><span title='2015-04-24 00:19:36 +0000 UTC'>April 24, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry_hu578451d39f2ee65bac6accbf307997d3_141026_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry_hu578451d39f2ee65bac6accbf307997d3_141026_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry_hu578451d39f2ee65bac6accbf307997d3_141026_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry_hu578451d39f2ee65bac6accbf307997d3_141026_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry.jpg 1280w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/industry.jpg alt width=1280 height=653></figure><div class=post-content><p class=intro-note>This post covers recent choices I've made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons.</p><p>I recently read <a href=http://thischangeseverything.org/ target=_blank rel=noopener>Naomi Klein&rsquo;s This Changes Everything</a>, which deeply influenced me. The book describes how the world has been dragging its feet when it comes to reducing carbon emissions, and how we are coming very close to a point where climate change is likely to spin out of control. While many of the facts presented in the book can be very depressing, one ray of light is that it is still not too late to act. There are still things we can do to avoid catastrophic climate change.</p><p>One such thing is <a href=http://gofossilfree.org/ target=_blank rel=noopener>divestment from fossil fuels</a>. Fossil fuel companies have committed to extracting (and therefore burning) <a href=https://theconversation.com/unburnable-carbon-why-we-need-to-leave-fossil-fuels-in-the-ground-40467 target=_blank rel=noopener>more than what scientists agree is the safe amount of carbon that can be pumped into the atmosphere</a>. While governments have been rather ineffective in stopping this (the current Australian government is even <a href=https://www.facebook.com/theprojecttv/videos/10152808607343441/ target=_blank rel=noopener>embarrassingly rolling back emission-reduction measures</a>), divesting your money from such companies can help take away the social licence of these companies to do as they please. Further, this may be a smart investment strategy because the world is moving towards renewable energy. Indeed, according to one index, <a href=http://www.theguardian.com/environment/2015/apr/10/fossil-fuel-free-funds-out-performed-conventional-ones-analysis-shows target=_blank rel=noopener>investors who divested from fossil fuels have had higher returns than conventional investors over the last five years</a>.</p><p>It&rsquo;s worth noting that even if you disagree with the scientific consensus that releasing <a href=https://en.wikipedia.org/wiki/Greenhouse_gas target=_blank rel=noopener>billions of tonnes of greenhouse gases</a> into the atmosphere increases the likelihood of climate change, you should agree that it&rsquo;d be better to stop breathing all the pollutants that result from burning fossil fuels. Further, the environmental damage that comes with extracting fossil fuels is something worth avoiding. Examples include <a href=https://en.wikipedia.org/wiki/Deepwater_Horizon_oil_spill target=_blank rel=noopener>the Deepwater Horizon oil spill</a>, <a href=https://en.wikipedia.org/wiki/Environmental_impact_of_hydraulic_fracturing target=_blank rel=noopener>numerous cases of poisoned water due to fracking</a>, and <a href=http://fightforthereef.org.au/ target=_blank rel=noopener>the potential damage to the Great Barrier Reef due to coal mine expansion</a>. Even climate change deniers would admit that divestment from fossil fuels and a rapid move to clean renewables will prevent such disasters.</p><p>The rest of this post describes steps I&rsquo;ve recently taken towards divesting from fossil fuels. These are mostly relevant to Australians, though other countries may have similar options.</p><h3 id=superannuation>Superannuation<a hidden class=anchor aria-hidden=true href=#superannuation>#</a></h3><p>In Australia, we have <a href=https://en.wikipedia.org/wiki/Superannuation_in_Australia target=_blank rel=noopener>compulsory superannuation</a> (commonly known as <em>super</em>), meaning that most working Australians have some money invested somewhere. As this money is only available at retirement, investors can afford to optimise for long-term returns. Many super funds allow investors to choose what to invest in, and switching funds is relatively straightforward. My super fund is <a href=http://www.unisuper.com.au/ target=_blank rel=noopener>UniSuper</a>. Last week, I switched my plan from <a href=http://www.unisuper.com.au/investments/investment-options-and-performance/super-performance-and-option-holdings/balanced target=_blank rel=noopener>Balanced</a>, which includes investments in coal miners Rio Tinto and BHP Billiton, to 75% <a href=http://www.unisuper.com.au/investments/investment-options-and-performance/super-performance-and-option-holdings/sustainable-balanced target=_blank rel=noopener>Sustainable Balanced</a>, which doesn&rsquo;t directly invest in fossil fuels, and 25% <a href=http://www.unisuper.com.au/investments/investment-options-and-performance/super-performance-and-option-holdings/global-environmental-opportunities target=_blank rel=noopener>Global Environment Opportunities</a>, which is focused on companies with a green agenda such as Tesla. This switch was very simple – I wish I had done it earlier. If you&rsquo;re interested in making a similar switch, check out <a href=http://superswitch.org.au/ target=_blank rel=noopener>Superswitch&rsquo;s guide to fossil-free super options</a>.</p><h3 id=energy>Energy<a hidden class=anchor aria-hidden=true href=#energy>#</a></h3><p>While our previous energy retailer (ClickEnergy) isn&rsquo;t one of the big three retailers <a href=https://www.getup.org.au/campaigns/renewable-energy/send-the-dirty-three-a-message/hit-the-dirty-three-where-it-hurts target=_blank rel=noopener>who are actively lobbying the government to reduce the renewable energy target for 2020</a>, my partner and I decided to switch to <a href=http://www.powershop.com.au/ target=_blank rel=noopener>Powershop</a>, as it appears to be the greenest energy retailer in New South Wales. Powershop <a href=http://www.powershop.com.au/renewables/ target=_blank rel=noopener>supports maintaining the renewable energy target in its current form</a> and provides free carbon offsets for all non-renewable energy. In addition, Powershop allows customers to purchase 100% green power from renewables – an option that we choose to take. With the savings from moving to Powershop and the extra payment for green power, our bill is expected to be more or less the same as before. Everyone wins!</p><p>Note: If you live in New South Wales or Victoria and generally support what GetUp is doing, you can sign up via <a href=https://www.getup.org.au/campaigns/renewable-energy/send-the-dirty-three-a-message/hit-the-dirty-three-where-it-hurts target=_blank rel=noopener>the links on this page</a>, and GetUp will be paid a referral fee by Powershop.</p><h3 id=banking>Banking<a hidden class=anchor aria-hidden=true href=#banking>#</a></h3><p>There&rsquo;s been a lot of focus recently on <a href=http://gofossilfree.org.au/fossil-free-banks/ target=_blank rel=noopener>financing provided by the major banks to fossil fuel companies</a>. The problem is that – unlike with super and energy – there aren&rsquo;t many viable alternatives to the big banks. Reading the <a href=http://www.marketforces.org.au/banks/compare target=_blank rel=noopener>statements by smaller banks and credit unions</a>, it is clear that they don&rsquo;t provide financing to polluters just because they&rsquo;re too small or not focused on commercial lending. Further, some of the smaller banks invest their money with the bigger banks. If the smaller banks were to become big due to the divestment movement, they may end up financing polluters. Unfortunately, changing your bank doesn&rsquo;t give you more control over how your chosen financial institute uses your money.</p><p>For now, I think it makes sense to push the banks to become fossil free by <a href=http://action.marketforces.org.au/page/s/banks-on-notice target=_blank rel=noopener>putting them on notice</a> or <a href=http://act.350.org/event/CBA_Week_of_Action/ target=_blank rel=noopener>participating in demonstrations</a>. With enough pressure, one of the big banks may make a strong statement against lending to polluters, and then it&rsquo;ll be time to act on the notices. One thing that the big banks care about is <a href=http://www.roymorgan.com/findings/6028-consumer-sat-with-banks-close-to-record-high-201501262213 target=_blank rel=noopener>customer satisfaction</a> and public image. Sending a strong message about the connection between financing polluters and satisfaction may be enough to make a difference. I&rsquo;ll be tracking news in this area and will possibly make a switch in the future, depending on how things evolve.</p><h3 id=transportation>Transportation<a hidden class=anchor aria-hidden=true href=#transportation>#</a></h3><p>My top transportation choices are cycling and public transport, followed by driving when the former two are highly inconvenient (e.g., when going scuba diving). Every bike ride means less pollution and is a vote against fossil fuels. Further, bike riding is my main form of exercise, so I don&rsquo;t need to set aside time to go to the gym. Finally, it&rsquo;s almost free, and it&rsquo;s also the fastest way of getting to the city from where I live.</p><p>Since January, I&rsquo;ve been allowing people to borrow my car through Car Next Door. This service, which is currently active in Sydney and Melbourne, allows people to hire their neighbours&rsquo; cars, thereby reducing the number of cars on the road. They also <a href=http://www.carnextdoor.com.au/carbon-offset/ target=_blank rel=noopener>carbon offset all the rides taken through the service</a>. While making my car available has made using it slightly less convenient (because I need to book it for myself), it&rsquo;s also saved me money, so far covering the cost of insurance and roadside assistance. With my car sitting idle for 95% of the time before joining Car Next Door, it&rsquo;s definitely another win-win situation. If you&rsquo;d like to join Car Next Door as either a borrower or an owner, you can <a href="http://carnextdoor.ontraport.net/t?orid=26287&opid=2" target=_blank rel=noopener>use this link to get $15 credit</a>.</p><h3 id=other-areas-and-next-steps>Other areas and next steps<a hidden class=anchor aria-hidden=true href=#other-areas-and-next-steps>#</a></h3><p>Many of the choices we make every day have the power to reduce energy demand. These choices often make our life better, as seen with the bike riding example above. There&rsquo;s a lot of material online about these green choices, which I may cover from my angle in another post. In general, I&rsquo;m planning to be more active in the area of environmentalism. While this may come at the cost of reduced focus on <a href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/ title="The long road to a lifestyle business">my other activities</a>, I would rather be more a part of the solution than a part of the problem. I&rsquo;ll update as I go – please subscribe to get notified when updates occur.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/climate-change/>Climate Change</a></li><li><a href=https://yanirseroussi.com/tags/divestment/>Divestment</a></li><li><a href=https://yanirseroussi.com/tags/environment/>Environment</a></li><li><a href=https://yanirseroussi.com/tags/fossil-fuels/>Fossil Fuels</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share My divestment from fossil fuels on x" href="https://x.com/intent/tweet/?text=My%20divestment%20from%20fossil%20fuels&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f04%2f24%2fmy-divestment-from-fossil-fuels%2f&amp;hashtags=climatechange%2cdivestment%2cenvironment%2cfossilfuels"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My divestment from fossil fuels on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f04%2f24%2fmy-divestment-from-fossil-fuels%2f&amp;title=My%20divestment%20from%20fossil%20fuels&amp;summary=My%20divestment%20from%20fossil%20fuels&amp;source=https%3a%2f%2fyanirseroussi.com%2f2015%2f04%2f24%2fmy-divestment-from-fossil-fuels%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My divestment from fossil fuels on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2015%2f04%2f24%2fmy-divestment-from-fossil-fuels%2f&title=My%20divestment%20from%20fossil%20fuels"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My divestment from fossil fuels on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2015%2f04%2f24%2fmy-divestment-from-fossil-fuels%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My divestment from fossil fuels on whatsapp" href="https://api.whatsapp.com/send?text=My%20divestment%20from%20fossil%20fuels%20-%20https%3a%2f%2fyanirseroussi.com%2f2015%2f04%2f24%2fmy-divestment-from-fossil-fuels%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My divestment from fossil fuels on telegram" href="https://telegram.me/share/url?text=My%20divestment%20from%20fossil%20fuels&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f04%2f24%2fmy-divestment-from-fossil-fuels%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My divestment from fossil fuels on ycombinator" href="https://news.ycombinator.com/submitlink?t=My%20divestment%20from%20fossil%20fuels&u=https%3a%2f%2fyanirseroussi.com%2f2015%2f04%2f24%2fmy-divestment-from-fossil-fuels%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/index.html b/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/index.html
index 081c56642..0e2002fdf 100644
--- a/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/index.html
+++ b/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>First steps in data science: author-aware sentiment analysis | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data science,machine learning,predictive modelling,sentiment analysis,software engineering"><meta name=description content="I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="First steps in data science: author-aware sentiment analysis"><meta property="og:description" content="I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/"><meta property="og:image" content="https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/kitten-first-steps.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-05-02T08:31:10+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/kitten-first-steps.jpg"><meta name=twitter:title content="First steps in data science: author-aware sentiment analysis"><meta name=twitter:description content="I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"First steps in data science: author-aware sentiment analysis","item":"https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"First steps in data science: author-aware sentiment analysis","name":"First steps in data science: author-aware sentiment analysis","description":"I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.","keywords":["data science","machine learning","predictive modelling","sentiment analysis","software engineering"],"articleBody":"People often ask me what’s the best way of becoming a data scientist. The way I got there was by first becoming a software engineer and then doing a PhD in what was essentially data science (before it became such a popular term). This post describes my first steps in the field with the goal of helping others who are interested in making the transition from pure software engineering to data science.\nWhile my first steps were in a PhD program, I don’t think that going through the formal PhD process is necessary if you wish to become a data scientist. Self-motivated individuals can get very far by making use of the abundance of learning resources available online. In fact, one can make progress much faster than in a PhD, because PhD programs have many overheads.\nThis post is organised as a list of steps. Despite the sequential numbering, many steps can be done in parallel. These steps roughly recount the work I’ve done to publish my first paper, which was co-authored by Ingrid Zukerman and Fabian Bohnert. Most of the technical details are intentionally omitted. Readers who are interested in learning more are invited to read the original paper or chapter 6 in my thesis, which includes more thorough experiments and explanations.\nStep one: Find a problem to work on Even if you know nothing about the machine learning and statistics side of data science, it’s important to find a problem to work on. Ideally it’d be something you find personally interesting, as this helps with motivation. You could use a predefined problem such as a Kaggle competition or one of the UCI datasets. Alternatively, you could collect the data yourself to make things a bit more challenging.\nIn my case, I was interested in natural language processing and user modelling. My supervisor was given a grant to work on sentiment analysis of opinion polls, which was my first direction of research. This quickly changed to focus on the connection between authors and the way they express their sentiments, with the application of harnessing this connection to improve the accuracy of sentiment analysis algorithms. For the purpose of this research, I collected a dataset of texts by the most prolific IMDb users. The problem was to infer the ratings these users assigned to their own reviews, with the hypothesis that methods that take author identity into account would outperform methods that ignore authorship information.\nStep two: Close your knowledge gaps Whatever problem you choose, you will have some knowledge gaps that require filling. Wikipedia, textbooks, and online courses will be your best guide for foundational areas like machine learning and statistics. Reading academic papers is often required to get a better understanding of recent work on the specific problem you’re trying to solve.\nDoing a PhD afforded me the luxury of spending about a month just reading papers. Most of the ~200 papers I read were on sentiment analysis, which gave me a good overview of what’s been done in the field. However, the best thing I’ve done was to stop reading and move on to working on the problem. This is also the best advice I can give: there’s no better way to learn than getting your hands dirty working on a problem.\nStep three: Get your hands dirty With a well-defined problem and the knowledge gaps more-or-less closed, it is time to come up with a plan and implement it. Due to my background in software engineering and some exposure to early collaborative filtering approaches to recommender systems, my plan was very much a part of what Leo Breiman called the algorithmic modelling culture. That is, I was more focused on developing algorithms that work than on modelling the process that generated the data. This approach is arguably more in line with the mindset that software engineers tend to have than with the approach of mathematicians and statisticians.\nThe plan was quite simple:\nReproduce results that showed that rating inference models trained on enough texts by the target author (i.e., the author who wrote the text whose rating we want to predict) outperform models trained on texts by multiple authors Use an approach inspired by collaborative filtering to combine multiple single-author models to infer ratings for texts by the target author, where those models are weighted by similarity to the target author Experiment with multiple similarity measurements under various constraints on the number of texts available by the training and target authors Iterate on these ideas until the results are publishable The rationale behind this plan was that while different people express their sentiments differently, similar people would express their sentiments similarly (e.g., use of understatements varies by culture). The key motivation was Pang and Lee’s finding that a model trained on a single author is best if we have enough texts by this author.\nThe way I implemented the plan was vastly different from how I’d do it today. This was 2009, and using Java with the Weka package for the core modelling seemed like a huge improvement over the C/C++ I was used to. I relied heavily on the university grid to run experiments and wrote a bunch of code to handle experimental logic, including some Perl scripts for post-processing. It ended up being pretty messy, but it worked and I got publishable results. If I were to do the same work today, I’d use Python for everything. IPython Notebook is a great way of keeping track of experimental work, and Python packages like pandas, scikit-learn, gensim, TextBlob, etc. are mature and easy to use for data science applications.\nStep four: Publish your results Having a deadline for publishing results can be stressful, but it has two positive outcomes. First, making your work public allows you to obtain valuable feedback. Second, hard deadlines are great in making you work towards a tangible goal. You can always keep iterating to get infinitesimal improvements, but publication deadlines force you to decide that you’ve done enough.\nIn my case, the deadline for the UMAP 2010 conference and the promise of a free trip to Hawaii served as excellent motivators. But even if you don’t have the time or energy to get an academic paper published, you should set yourself a deadline to publish something on a blog or a forum, or even as a report to a mentor who can assess your work. Receiving continuous feedback is a key factor in improvement, so release early and release often.\nStep five: Improve results or move on Congratulations! You have published the results of your study. What now? You can either keep working on the same problem – try more approaches, add more data, change the constraints, etc. Or you can move on to work on other problems that interest you.\nIn my case, I had to go back to iterate on the results of the first paper because of things I learned later. I ended up rerunning all the experiments to make things fit together into a more-or-less coherent story for the thesis (writing a thesis is one of the main overheads that comes with doing a PhD). If I had a choice, I wouldn’t have done that. I would instead have pursued more sensible enhancements to the work presented in the paper, such as using the author as a feature, employing more robust ensemble methods, and testing different base methods than support vector machines. Nonetheless, I still think that the core idea – that the identity of authors should be taken into account in sentiment analysis – is still relevant and viable today. But I’ve taken my own advice and moved on.\n","wordCount":"1274","inLanguage":"en","image":"https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/kitten-first-steps.jpg","datePublished":"2015-05-02T08:31:10Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">First steps in data science: author-aware sentiment analysis</h1><div class=post-meta><span title='2015-05-02 08:31:10 +0000 UTC'>May 2, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/kitten-first-steps_hu71ce6d56294695860e76fe8bc29b8d4b_64845_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/kitten-first-steps_hu71ce6d56294695860e76fe8bc29b8d4b_64845_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/kitten-first-steps.jpg 635w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/kitten-first-steps.jpg alt width=635 height=220></figure><div class=post-content><p>People often ask me what&rsquo;s the best way of becoming a <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/ title="What is data science?" target=_blank rel=noopener>data scientist</a>. The way I got there was by first becoming a software engineer and then doing a PhD in what was essentially data science (before it became such a popular term). This post describes my first steps in the field with the goal of helping others who are interested in making the transition from pure software engineering to data science.</p><p>While my first steps were in a <a href=https://yanirseroussi.com/phd-work/ title="PhD Work" target=_blank rel=noopener>PhD program</a>, I don&rsquo;t think that going through the formal PhD process is necessary if you wish to become a data scientist. Self-motivated individuals can get very far by making use of the abundance of learning resources available online. In fact, one can make progress much faster than in a PhD, because PhD programs have many overheads.</p><p>This post is organised as a list of steps. Despite the sequential numbering, many steps can be done in parallel. These steps roughly recount the work I&rsquo;ve done to publish my first paper, which was co-authored by <a href=http://users.monash.edu/~ingrid/ target=_blank rel=noopener>Ingrid Zukerman</a> and <a href=https://sites.google.com/a/bohnert.eu/fabian-bohnert/ target=_blank rel=noopener>Fabian Bohnert</a>. Most of the technical details are intentionally omitted. Readers who are interested in learning more are invited to read the <a href=https://dl.dropboxusercontent.com/u/25632965/SeroussiZukermanBohnert2010b.pdf title="Collaborative Inference of Sentiments from Texts" target=_blank rel=noopener>original paper</a> or chapter 6 in <a href=http://arrow.monash.edu.au/vital/access/services/Download/monash:89860/THESIS01 title="Text Mining and Rating Prediction with Topical User Models" target=_blank rel=noopener>my thesis</a>, which includes more thorough experiments and explanations.</p><h3 id=step-one-find-a-problem-to-work-on>Step one: Find a problem to work on<a hidden class=anchor aria-hidden=true href=#step-one-find-a-problem-to-work-on>#</a></h3><p>Even if you know nothing about the machine learning and statistics side of data science, it&rsquo;s important to find a problem to work on. Ideally it&rsquo;d be something you find personally interesting, as this helps with motivation. You could use a predefined problem such as a <a href=http://www.kaggle.com/competitions target=_blank rel=noopener>Kaggle competition</a> or one of the <a href=http://archive.ics.uci.edu/ml/datasets.html target=_blank rel=noopener>UCI datasets</a>. Alternatively, you could collect the data yourself to make things a bit more challenging.</p><p>In my case, I was interested in <a href=http://www.csse.monash.edu.au/research/umnl/ target=_blank rel=noopener>natural language processing and user modelling</a>. My supervisor was given a grant to work on <a href=https://en.wikipedia.org/wiki/Sentiment_analysis target=_blank rel=noopener>sentiment analysis</a> of opinion polls, which was my first direction of research. This quickly changed to focus on the connection between authors and the way they express their sentiments, with the application of harnessing this connection to improve the accuracy of sentiment analysis algorithms. For the purpose of this research, I collected a dataset of texts by the most prolific <a href=http://www.imdb.com/ target=_blank rel=noopener>IMDb</a> users. The problem was to infer the ratings these users assigned to their own reviews, with the hypothesis that methods that take author identity into account would outperform methods that ignore authorship information.</p><h3 id=step-two-close-your-knowledge-gaps>Step two: Close your knowledge gaps<a hidden class=anchor aria-hidden=true href=#step-two-close-your-knowledge-gaps>#</a></h3><p>Whatever problem you choose, you will have some knowledge gaps that require filling. Wikipedia, textbooks, and online courses will be your best guide for foundational areas like machine learning and statistics. Reading academic papers is often required to get a better understanding of recent work on the specific problem you&rsquo;re trying to solve.</p><p>Doing a PhD afforded me the luxury of spending about a month just reading papers. Most of the ~200 papers I read were on sentiment analysis, which gave me a good overview of what&rsquo;s been done in the field. However, the best thing I&rsquo;ve done was to stop reading and move on to working on the problem. This is also the best advice I can give: there&rsquo;s no better way to learn than getting your hands dirty working on a problem.</p><h3 id=step-three-get-your-hands-dirty>Step three: Get your hands dirty<a hidden class=anchor aria-hidden=true href=#step-three-get-your-hands-dirty>#</a></h3><p>With a well-defined problem and the knowledge gaps more-or-less closed, it is time to come up with a plan and implement it. Due to my background in software engineering and some exposure to <a href=https://en.wikipedia.org/wiki/Collaborative_filtering#Memory-based target=_blank rel=noopener>early collaborative filtering approaches to recommender systems</a>, my plan was very much a part of what Leo Breiman called the <a href=http://projecteuclid.org/euclid.ss/1009213726 title="Statistical Modeling: The Two Cultures" target=_blank rel=noopener>algorithmic modelling culture</a>. That is, I was more focused on developing algorithms that work than on modelling the process that generated the data. This approach is arguably more in line with the mindset that software engineers tend to have than with the approach of mathematicians and statisticians.</p><p>The plan was quite simple:</p><ul><li>Reproduce results that showed that rating inference models trained on enough texts by the <em>target author</em> (i.e., the author who wrote the text whose rating we want to predict) outperform models trained on texts by multiple authors</li><li>Use an approach inspired by collaborative filtering to combine multiple single-author models to infer ratings for texts by the target author, where those models are weighted by similarity to the target author</li><li>Experiment with multiple similarity measurements under various constraints on the number of texts available by the training and target authors</li><li>Iterate on these ideas until the results are publishable</li></ul><p>The rationale behind this plan was that while different people express their sentiments differently, similar people would express their sentiments similarly (e.g., use of understatements varies by culture). The key motivation was <a href=http://arxiv.org/pdf/cs/0506075.pdf target=_blank rel=noopener>Pang and Lee&rsquo;s finding</a> that a model trained on a single author is best if we have enough texts by this author.</p><p>The way I implemented the plan was vastly different from how I&rsquo;d do it today. This was 2009, and using Java with the <a href=http://www.cs.waikato.ac.nz/ml/weka/ target=_blank rel=noopener>Weka package</a> for the core modelling seemed like a huge improvement over the C/C++ I was used to. I relied heavily on the university grid to run experiments and wrote a bunch of code to handle experimental logic, including some Perl scripts for post-processing. It ended up being pretty messy, but it worked and I got publishable results. If I were to do the same work today, I&rsquo;d use Python for everything. <a href=http://ipython.org/notebook.html target=_blank rel=noopener>IPython Notebook</a> is a great way of keeping track of experimental work, and Python packages like pandas, scikit-learn, gensim, TextBlob, etc. are mature and easy to use for data science applications.</p><h3 id=step-four-publish-your-results>Step four: Publish your results<a hidden class=anchor aria-hidden=true href=#step-four-publish-your-results>#</a></h3><p>Having a deadline for publishing results can be stressful, but it has two positive outcomes. First, making your work public allows you to obtain valuable feedback. Second, hard deadlines are great in making you work towards a tangible goal. You can always keep iterating to get infinitesimal improvements, but publication deadlines force you to decide that you&rsquo;ve done enough.</p><p>In my case, the deadline for the <a href=http://www.um.org/ target=_blank rel=noopener>UMAP 2010 conference</a> and the promise of a free trip to Hawaii served as excellent motivators. But even if you don&rsquo;t have the time or energy to get an academic paper published, you should set yourself a deadline to publish something on a blog or a forum, or even as a report to a mentor who can assess your work. Receiving continuous feedback is a key factor in improvement, so <a href=https://en.wikipedia.org/wiki/Release_early%2C_release_often target=_blank rel=noopener>release early and release often</a>.</p><h3 id=step-five-improve-results-or-move-on>Step five: Improve results or move on<a hidden class=anchor aria-hidden=true href=#step-five-improve-results-or-move-on>#</a></h3><p>Congratulations! You have published the results of your study. What now? You can either keep working on the same problem – try more approaches, add more data, change the constraints, etc. Or you can move on to work on other problems that interest you.</p><p>In my case, I had to go back to iterate on the results of the first paper because of things I learned later. I ended up rerunning all the experiments to make things fit together into a more-or-less coherent story for the thesis (writing a thesis is one of the main overheads that comes with doing a PhD). If I had a choice, I wouldn&rsquo;t have done that. I would instead have pursued more sensible enhancements to the work presented in the paper, such as using the author as a feature, employing more robust ensemble methods, and testing different base methods than support vector machines. Nonetheless, I still think that the core idea – that the identity of authors should be taken into account in sentiment analysis – is still relevant and viable today. But I&rsquo;ve taken my own advice and moved on.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/predictive-modelling/>Predictive Modelling</a></li><li><a href=https://yanirseroussi.com/tags/sentiment-analysis/>Sentiment Analysis</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share First steps in data science: author-aware sentiment analysis on x" href="https://x.com/intent/tweet/?text=First%20steps%20in%20data%20science%3a%20author-aware%20sentiment%20analysis&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f05%2f02%2ffirst-steps-in-data-science-author-aware-sentiment-analysis%2f&amp;hashtags=datascience%2cmachinelearning%2cpredictivemodelling%2csentimentanalysis%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share First steps in data science: author-aware sentiment analysis on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f05%2f02%2ffirst-steps-in-data-science-author-aware-sentiment-analysis%2f&amp;title=First%20steps%20in%20data%20science%3a%20author-aware%20sentiment%20analysis&amp;summary=First%20steps%20in%20data%20science%3a%20author-aware%20sentiment%20analysis&amp;source=https%3a%2f%2fyanirseroussi.com%2f2015%2f05%2f02%2ffirst-steps-in-data-science-author-aware-sentiment-analysis%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share First steps in data science: author-aware sentiment analysis on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2015%2f05%2f02%2ffirst-steps-in-data-science-author-aware-sentiment-analysis%2f&title=First%20steps%20in%20data%20science%3a%20author-aware%20sentiment%20analysis"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share First steps in data science: author-aware sentiment analysis on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2015%2f05%2f02%2ffirst-steps-in-data-science-author-aware-sentiment-analysis%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share First steps in data science: author-aware sentiment analysis on whatsapp" href="https://api.whatsapp.com/send?text=First%20steps%20in%20data%20science%3a%20author-aware%20sentiment%20analysis%20-%20https%3a%2f%2fyanirseroussi.com%2f2015%2f05%2f02%2ffirst-steps-in-data-science-author-aware-sentiment-analysis%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share First steps in data science: author-aware sentiment analysis on telegram" href="https://telegram.me/share/url?text=First%20steps%20in%20data%20science%3a%20author-aware%20sentiment%20analysis&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f05%2f02%2ffirst-steps-in-data-science-author-aware-sentiment-analysis%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share First steps in data science: author-aware sentiment analysis on ycombinator" href="https://news.ycombinator.com/submitlink?t=First%20steps%20in%20data%20science%3a%20author-aware%20sentiment%20analysis&u=https%3a%2f%2fyanirseroussi.com%2f2015%2f05%2f02%2ffirst-steps-in-data-science-author-aware-sentiment-analysis%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="data science,machine learning,predictive modelling,sentiment analysis,software engineering"><meta name=description content="I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="First steps in data science: author-aware sentiment analysis"><meta property="og:description" content="I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/"><meta property="og:image" content="https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/kitten-first-steps.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-05-02T08:31:10+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/kitten-first-steps.jpg"><meta name=twitter:title content="First steps in data science: author-aware sentiment analysis"><meta name=twitter:description content="I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"First steps in data science: author-aware sentiment analysis","item":"https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"First steps in data science: author-aware sentiment analysis","name":"First steps in data science: author-aware sentiment analysis","description":"I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.","keywords":["data science","machine learning","predictive modelling","sentiment analysis","software engineering"],"articleBody":"People often ask me what’s the best way of becoming a data scientist. The way I got there was by first becoming a software engineer and then doing a PhD in what was essentially data science (before it became such a popular term). This post describes my first steps in the field with the goal of helping others who are interested in making the transition from pure software engineering to data science.\nWhile my first steps were in a PhD program, I don’t think that going through the formal PhD process is necessary if you wish to become a data scientist. Self-motivated individuals can get very far by making use of the abundance of learning resources available online. In fact, one can make progress much faster than in a PhD, because PhD programs have many overheads.\nThis post is organised as a list of steps. Despite the sequential numbering, many steps can be done in parallel. These steps roughly recount the work I’ve done to publish my first paper, which was co-authored by Ingrid Zukerman and Fabian Bohnert. Most of the technical details are intentionally omitted. Readers who are interested in learning more are invited to read the original paper or chapter 6 in my thesis, which includes more thorough experiments and explanations.\nStep one: Find a problem to work on Even if you know nothing about the machine learning and statistics side of data science, it’s important to find a problem to work on. Ideally it’d be something you find personally interesting, as this helps with motivation. You could use a predefined problem such as a Kaggle competition or one of the UCI datasets. Alternatively, you could collect the data yourself to make things a bit more challenging.\nIn my case, I was interested in natural language processing and user modelling. My supervisor was given a grant to work on sentiment analysis of opinion polls, which was my first direction of research. This quickly changed to focus on the connection between authors and the way they express their sentiments, with the application of harnessing this connection to improve the accuracy of sentiment analysis algorithms. For the purpose of this research, I collected a dataset of texts by the most prolific IMDb users. The problem was to infer the ratings these users assigned to their own reviews, with the hypothesis that methods that take author identity into account would outperform methods that ignore authorship information.\nStep two: Close your knowledge gaps Whatever problem you choose, you will have some knowledge gaps that require filling. Wikipedia, textbooks, and online courses will be your best guide for foundational areas like machine learning and statistics. Reading academic papers is often required to get a better understanding of recent work on the specific problem you’re trying to solve.\nDoing a PhD afforded me the luxury of spending about a month just reading papers. Most of the ~200 papers I read were on sentiment analysis, which gave me a good overview of what’s been done in the field. However, the best thing I’ve done was to stop reading and move on to working on the problem. This is also the best advice I can give: there’s no better way to learn than getting your hands dirty working on a problem.\nStep three: Get your hands dirty With a well-defined problem and the knowledge gaps more-or-less closed, it is time to come up with a plan and implement it. Due to my background in software engineering and some exposure to early collaborative filtering approaches to recommender systems, my plan was very much a part of what Leo Breiman called the algorithmic modelling culture. That is, I was more focused on developing algorithms that work than on modelling the process that generated the data. This approach is arguably more in line with the mindset that software engineers tend to have than with the approach of mathematicians and statisticians.\nThe plan was quite simple:\nReproduce results that showed that rating inference models trained on enough texts by the target author (i.e., the author who wrote the text whose rating we want to predict) outperform models trained on texts by multiple authors Use an approach inspired by collaborative filtering to combine multiple single-author models to infer ratings for texts by the target author, where those models are weighted by similarity to the target author Experiment with multiple similarity measurements under various constraints on the number of texts available by the training and target authors Iterate on these ideas until the results are publishable The rationale behind this plan was that while different people express their sentiments differently, similar people would express their sentiments similarly (e.g., use of understatements varies by culture). The key motivation was Pang and Lee’s finding that a model trained on a single author is best if we have enough texts by this author.\nThe way I implemented the plan was vastly different from how I’d do it today. This was 2009, and using Java with the Weka package for the core modelling seemed like a huge improvement over the C/C++ I was used to. I relied heavily on the university grid to run experiments and wrote a bunch of code to handle experimental logic, including some Perl scripts for post-processing. It ended up being pretty messy, but it worked and I got publishable results. If I were to do the same work today, I’d use Python for everything. IPython Notebook is a great way of keeping track of experimental work, and Python packages like pandas, scikit-learn, gensim, TextBlob, etc. are mature and easy to use for data science applications.\nStep four: Publish your results Having a deadline for publishing results can be stressful, but it has two positive outcomes. First, making your work public allows you to obtain valuable feedback. Second, hard deadlines are great in making you work towards a tangible goal. You can always keep iterating to get infinitesimal improvements, but publication deadlines force you to decide that you’ve done enough.\nIn my case, the deadline for the UMAP 2010 conference and the promise of a free trip to Hawaii served as excellent motivators. But even if you don’t have the time or energy to get an academic paper published, you should set yourself a deadline to publish something on a blog or a forum, or even as a report to a mentor who can assess your work. Receiving continuous feedback is a key factor in improvement, so release early and release often.\nStep five: Improve results or move on Congratulations! You have published the results of your study. What now? You can either keep working on the same problem – try more approaches, add more data, change the constraints, etc. Or you can move on to work on other problems that interest you.\nIn my case, I had to go back to iterate on the results of the first paper because of things I learned later. I ended up rerunning all the experiments to make things fit together into a more-or-less coherent story for the thesis (writing a thesis is one of the main overheads that comes with doing a PhD). If I had a choice, I wouldn’t have done that. I would instead have pursued more sensible enhancements to the work presented in the paper, such as using the author as a feature, employing more robust ensemble methods, and testing different base methods than support vector machines. Nonetheless, I still think that the core idea – that the identity of authors should be taken into account in sentiment analysis – is still relevant and viable today. But I’ve taken my own advice and moved on.\n","wordCount":"1274","inLanguage":"en","image":"https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/kitten-first-steps.jpg","datePublished":"2015-05-02T08:31:10Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">First steps in data science: author-aware sentiment analysis</h1><div class=post-meta><span title='2015-05-02 08:31:10 +0000 UTC'>May 2, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/kitten-first-steps_hu71ce6d56294695860e76fe8bc29b8d4b_64845_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/kitten-first-steps_hu71ce6d56294695860e76fe8bc29b8d4b_64845_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/kitten-first-steps.jpg 635w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/kitten-first-steps.jpg alt width=635 height=220></figure><div class=post-content><p>People often ask me what&rsquo;s the best way of becoming a <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/ title="What is data science?" target=_blank rel=noopener>data scientist</a>. The way I got there was by first becoming a software engineer and then doing a PhD in what was essentially data science (before it became such a popular term). This post describes my first steps in the field with the goal of helping others who are interested in making the transition from pure software engineering to data science.</p><p>While my first steps were in a <a href=https://yanirseroussi.com/phd-work/ title="PhD Work" target=_blank rel=noopener>PhD program</a>, I don&rsquo;t think that going through the formal PhD process is necessary if you wish to become a data scientist. Self-motivated individuals can get very far by making use of the abundance of learning resources available online. In fact, one can make progress much faster than in a PhD, because PhD programs have many overheads.</p><p>This post is organised as a list of steps. Despite the sequential numbering, many steps can be done in parallel. These steps roughly recount the work I&rsquo;ve done to publish my first paper, which was co-authored by <a href=http://users.monash.edu/~ingrid/ target=_blank rel=noopener>Ingrid Zukerman</a> and <a href=https://sites.google.com/a/bohnert.eu/fabian-bohnert/ target=_blank rel=noopener>Fabian Bohnert</a>. Most of the technical details are intentionally omitted. Readers who are interested in learning more are invited to read the <a href=https://dl.dropboxusercontent.com/u/25632965/SeroussiZukermanBohnert2010b.pdf title="Collaborative Inference of Sentiments from Texts" target=_blank rel=noopener>original paper</a> or chapter 6 in <a href=http://arrow.monash.edu.au/vital/access/services/Download/monash:89860/THESIS01 title="Text Mining and Rating Prediction with Topical User Models" target=_blank rel=noopener>my thesis</a>, which includes more thorough experiments and explanations.</p><h3 id=step-one-find-a-problem-to-work-on>Step one: Find a problem to work on<a hidden class=anchor aria-hidden=true href=#step-one-find-a-problem-to-work-on>#</a></h3><p>Even if you know nothing about the machine learning and statistics side of data science, it&rsquo;s important to find a problem to work on. Ideally it&rsquo;d be something you find personally interesting, as this helps with motivation. You could use a predefined problem such as a <a href=http://www.kaggle.com/competitions target=_blank rel=noopener>Kaggle competition</a> or one of the <a href=http://archive.ics.uci.edu/ml/datasets.html target=_blank rel=noopener>UCI datasets</a>. Alternatively, you could collect the data yourself to make things a bit more challenging.</p><p>In my case, I was interested in <a href=http://www.csse.monash.edu.au/research/umnl/ target=_blank rel=noopener>natural language processing and user modelling</a>. My supervisor was given a grant to work on <a href=https://en.wikipedia.org/wiki/Sentiment_analysis target=_blank rel=noopener>sentiment analysis</a> of opinion polls, which was my first direction of research. This quickly changed to focus on the connection between authors and the way they express their sentiments, with the application of harnessing this connection to improve the accuracy of sentiment analysis algorithms. For the purpose of this research, I collected a dataset of texts by the most prolific <a href=http://www.imdb.com/ target=_blank rel=noopener>IMDb</a> users. The problem was to infer the ratings these users assigned to their own reviews, with the hypothesis that methods that take author identity into account would outperform methods that ignore authorship information.</p><h3 id=step-two-close-your-knowledge-gaps>Step two: Close your knowledge gaps<a hidden class=anchor aria-hidden=true href=#step-two-close-your-knowledge-gaps>#</a></h3><p>Whatever problem you choose, you will have some knowledge gaps that require filling. Wikipedia, textbooks, and online courses will be your best guide for foundational areas like machine learning and statistics. Reading academic papers is often required to get a better understanding of recent work on the specific problem you&rsquo;re trying to solve.</p><p>Doing a PhD afforded me the luxury of spending about a month just reading papers. Most of the ~200 papers I read were on sentiment analysis, which gave me a good overview of what&rsquo;s been done in the field. However, the best thing I&rsquo;ve done was to stop reading and move on to working on the problem. This is also the best advice I can give: there&rsquo;s no better way to learn than getting your hands dirty working on a problem.</p><h3 id=step-three-get-your-hands-dirty>Step three: Get your hands dirty<a hidden class=anchor aria-hidden=true href=#step-three-get-your-hands-dirty>#</a></h3><p>With a well-defined problem and the knowledge gaps more-or-less closed, it is time to come up with a plan and implement it. Due to my background in software engineering and some exposure to <a href=https://en.wikipedia.org/wiki/Collaborative_filtering#Memory-based target=_blank rel=noopener>early collaborative filtering approaches to recommender systems</a>, my plan was very much a part of what Leo Breiman called the <a href=http://projecteuclid.org/euclid.ss/1009213726 title="Statistical Modeling: The Two Cultures" target=_blank rel=noopener>algorithmic modelling culture</a>. That is, I was more focused on developing algorithms that work than on modelling the process that generated the data. This approach is arguably more in line with the mindset that software engineers tend to have than with the approach of mathematicians and statisticians.</p><p>The plan was quite simple:</p><ul><li>Reproduce results that showed that rating inference models trained on enough texts by the <em>target author</em> (i.e., the author who wrote the text whose rating we want to predict) outperform models trained on texts by multiple authors</li><li>Use an approach inspired by collaborative filtering to combine multiple single-author models to infer ratings for texts by the target author, where those models are weighted by similarity to the target author</li><li>Experiment with multiple similarity measurements under various constraints on the number of texts available by the training and target authors</li><li>Iterate on these ideas until the results are publishable</li></ul><p>The rationale behind this plan was that while different people express their sentiments differently, similar people would express their sentiments similarly (e.g., use of understatements varies by culture). The key motivation was <a href=http://arxiv.org/pdf/cs/0506075.pdf target=_blank rel=noopener>Pang and Lee&rsquo;s finding</a> that a model trained on a single author is best if we have enough texts by this author.</p><p>The way I implemented the plan was vastly different from how I&rsquo;d do it today. This was 2009, and using Java with the <a href=http://www.cs.waikato.ac.nz/ml/weka/ target=_blank rel=noopener>Weka package</a> for the core modelling seemed like a huge improvement over the C/C++ I was used to. I relied heavily on the university grid to run experiments and wrote a bunch of code to handle experimental logic, including some Perl scripts for post-processing. It ended up being pretty messy, but it worked and I got publishable results. If I were to do the same work today, I&rsquo;d use Python for everything. <a href=http://ipython.org/notebook.html target=_blank rel=noopener>IPython Notebook</a> is a great way of keeping track of experimental work, and Python packages like pandas, scikit-learn, gensim, TextBlob, etc. are mature and easy to use for data science applications.</p><h3 id=step-four-publish-your-results>Step four: Publish your results<a hidden class=anchor aria-hidden=true href=#step-four-publish-your-results>#</a></h3><p>Having a deadline for publishing results can be stressful, but it has two positive outcomes. First, making your work public allows you to obtain valuable feedback. Second, hard deadlines are great in making you work towards a tangible goal. You can always keep iterating to get infinitesimal improvements, but publication deadlines force you to decide that you&rsquo;ve done enough.</p><p>In my case, the deadline for the <a href=http://www.um.org/ target=_blank rel=noopener>UMAP 2010 conference</a> and the promise of a free trip to Hawaii served as excellent motivators. But even if you don&rsquo;t have the time or energy to get an academic paper published, you should set yourself a deadline to publish something on a blog or a forum, or even as a report to a mentor who can assess your work. Receiving continuous feedback is a key factor in improvement, so <a href=https://en.wikipedia.org/wiki/Release_early%2C_release_often target=_blank rel=noopener>release early and release often</a>.</p><h3 id=step-five-improve-results-or-move-on>Step five: Improve results or move on<a hidden class=anchor aria-hidden=true href=#step-five-improve-results-or-move-on>#</a></h3><p>Congratulations! You have published the results of your study. What now? You can either keep working on the same problem – try more approaches, add more data, change the constraints, etc. Or you can move on to work on other problems that interest you.</p><p>In my case, I had to go back to iterate on the results of the first paper because of things I learned later. I ended up rerunning all the experiments to make things fit together into a more-or-less coherent story for the thesis (writing a thesis is one of the main overheads that comes with doing a PhD). If I had a choice, I wouldn&rsquo;t have done that. I would instead have pursued more sensible enhancements to the work presented in the paper, such as using the author as a feature, employing more robust ensemble methods, and testing different base methods than support vector machines. Nonetheless, I still think that the core idea – that the identity of authors should be taken into account in sentiment analysis – is still relevant and viable today. But I&rsquo;ve taken my own advice and moved on.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/predictive-modelling/>Predictive Modelling</a></li><li><a href=https://yanirseroussi.com/tags/sentiment-analysis/>Sentiment Analysis</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share First steps in data science: author-aware sentiment analysis on x" href="https://x.com/intent/tweet/?text=First%20steps%20in%20data%20science%3a%20author-aware%20sentiment%20analysis&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f05%2f02%2ffirst-steps-in-data-science-author-aware-sentiment-analysis%2f&amp;hashtags=datascience%2cmachinelearning%2cpredictivemodelling%2csentimentanalysis%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share First steps in data science: author-aware sentiment analysis on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f05%2f02%2ffirst-steps-in-data-science-author-aware-sentiment-analysis%2f&amp;title=First%20steps%20in%20data%20science%3a%20author-aware%20sentiment%20analysis&amp;summary=First%20steps%20in%20data%20science%3a%20author-aware%20sentiment%20analysis&amp;source=https%3a%2f%2fyanirseroussi.com%2f2015%2f05%2f02%2ffirst-steps-in-data-science-author-aware-sentiment-analysis%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share First steps in data science: author-aware sentiment analysis on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2015%2f05%2f02%2ffirst-steps-in-data-science-author-aware-sentiment-analysis%2f&title=First%20steps%20in%20data%20science%3a%20author-aware%20sentiment%20analysis"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share First steps in data science: author-aware sentiment analysis on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2015%2f05%2f02%2ffirst-steps-in-data-science-author-aware-sentiment-analysis%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share First steps in data science: author-aware sentiment analysis on whatsapp" href="https://api.whatsapp.com/send?text=First%20steps%20in%20data%20science%3a%20author-aware%20sentiment%20analysis%20-%20https%3a%2f%2fyanirseroussi.com%2f2015%2f05%2f02%2ffirst-steps-in-data-science-author-aware-sentiment-analysis%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share First steps in data science: author-aware sentiment analysis on telegram" href="https://telegram.me/share/url?text=First%20steps%20in%20data%20science%3a%20author-aware%20sentiment%20analysis&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f05%2f02%2ffirst-steps-in-data-science-author-aware-sentiment-analysis%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share First steps in data science: author-aware sentiment analysis on ycombinator" href="https://news.ycombinator.com/submitlink?t=First%20steps%20in%20data%20science%3a%20author-aware%20sentiment%20analysis&u=https%3a%2f%2fyanirseroussi.com%2f2015%2f05%2f02%2ffirst-steps-in-data-science-author-aware-sentiment-analysis%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2015/06/06/hopping-on-the-deep-learning-bandwagon/index.html b/2015/06/06/hopping-on-the-deep-learning-bandwagon/index.html
index 655504b7b..ab90f2936 100644
--- a/2015/06/06/hopping-on-the-deep-learning-bandwagon/index.html
+++ b/2015/06/06/hopping-on-the-deep-learning-bandwagon/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Hopping on the deep learning bandwagon | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="Bandcamp,data science,deep learning,machine learning,predictive modelling"><meta name=description content="To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Hopping on the deep learning bandwagon"><meta property="og:description" content="To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/"><meta property="og:image" content="https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre-shuffled.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-06-06T05:00:22+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre-shuffled.png"><meta name=twitter:title content="Hopping on the deep learning bandwagon"><meta name=twitter:description content="To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Hopping on the deep learning bandwagon","item":"https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Hopping on the deep learning bandwagon","name":"Hopping on the deep learning bandwagon","description":"To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.","keywords":["Bandcamp","data science","deep learning","machine learning","predictive modelling"],"articleBody":"I’ve been meaning to get into deep learning for the last few years. Now, the stars having finally aligned and I have the time and motivation to work on a small project that will hopefully improve my understanding of the field. This is the first in a series of posts that will document my progress on this project.\nAs mentioned in a previous post on getting started as a data scientist, I believe that the best way of becoming proficient at solving data science problems is by getting your hands dirty. Despite being familiar with high-level terminology and having some understanding of how it all works, I don’t have any practical experience applying deep learning. The purpose of this project is to fix this experience gap by working on a real problem.\nThe problem: Inferring genre from album covers Deep learning has been very successful at image classification. Therefore, it makes sense to work on an image classification problem for this project. Rather than using an existing dataset, I decided to make things a bit more interesting by building my own dataset. Over the last year, I’ve been running BCRecommender – a recommendation system for Bandcamp music. I’ve noticed that album covers vary by genre, though it’s hard to quantify exactly how they vary. So the question I’ll be trying to answer with this project is how accurately can genre be inferred from Bandcamp album covers?\nAs the goal of this project is to learn about deep learning rather than make a novel contribution, I didn’t do a comprehensive search to see whether this problem has been addressed before. However, I did find a recent post by Alexandre Passant that describes his use of Clarifai’s API to tag the content of Spotify album covers (identifying elements such as men, night, dark, etc.), and then using these tags to infer the album’s genre. Another related project is Karayev et al.’s Recognizing image style paper, in which the authors classified datasets of images from Flickr and Wikipedia by style and art genre, respectively. In all these cases, the results are pretty good, supporting my intuition that the genre inference task is feasible.\nData collection \u0026 splits As I’ve already been crawling Bandcamp data for BCRecommender, creating the dataset was relatively straightforward. Currently, I have data on about 1.8 million tracks and albums. Bandcamp artists assign multiple tags to each release. To create the dataset, I selected 10 of the top tags: ambient, dubstep, folk, hiphop_rap, jazz, metal, pop, punk, rock, and soul. Then, I randomly selected 10,000 album covers that have exactly one of those tags, with 1,000 albums for each tag/genre. Each cover image size is 350×350. The following image shows a sample of the dataset.\nIt is apparent that some genres can be inferred more easily than others, especially when browsing through the full dataset. For example, metal albums tend to be pretty distinct. I doubt that predictive accuracy would be very high, but I think that it can definitely be much better than the random baseline of 10%.\nFor training, validation and testing I decided to use a static stratified 80%/10%/10% split of the dataset. It quickly became apparently that the full dataset is too big for development purposes, making it hard to quickly test code on my local machine. To address this, I created a local development dataset, using an 80%/10%/10% split of 1,000 images from the full training subset.\nThe code for downloading the dataset and creating the splits is available from the project repository on GitHub. This repository will include all the code for the project as it evolves. I will try to keep it well-documented enough to be useful for others, though it assumes some familiarity with Python. If you experience any issues running the code or find any bugs, please let me know.\nGetting started One of the things that has stopped me from playing with deep learning in the past is the feeling that there is a bit of a steep learning curve around the tools and methods. A lot of the deep learning libraries out there don’t seem as mature as general machine learning libraries, such as scikit-learn. There are also many more parameters to play with when building deep neural networks than when using linear models or algorithms such as random forests. Further, to enable any kind of meaningful experimentation, using a GPU is essential.\nFortunately, the tools and documentation have matured a lot in recent years. Motivated by Daniel Nouri’s excellent tutorial on detecting facial keypoints with convolutional neural nets, I decided to use the Lasagne package as my starting point. My plan was simple: Convert the MNIST example code to work on my dataset locally, setup an AWS machine with a GPU for full-scale experiments, and then play with various network architectures and techniques to improve accuracy and gain a deeper understanding of deep learning.\nInitial environment setup While Lasagne’s MNIST example code is pretty clear – especially once you get your head around the way Theano works – it doesn’t really lend itself to easy experimentation. I addressed this by refactoring the code in several iterations, until I got to the current state, where there’s a simple command-line interface that allows me to experiment with different datasets and architectures. This will probably change and become more complex as I start doing more sophisticated things.\nTo enable rapid experimentation, I had to set up an AWS machine with a GPU (g2.2xlarge instance). I wrote some simple deployment code using Fabric, which allows me to setup a machine from scratch, install all the requirements, package the project, and copy it to the remote machine.\nGetting the code running on the CPU was trivial, but I hit several issues when running on the GPU. First, the vanilla Ubuntu 14.04 server I used didn’t come with CUDA installed. After trying and failing to get it working by following some tutorials, I ended up going down the easier path of using the AMI supplied by Caffe. This AMI also has the advantage of coming with Caffe installed (surprisingly), which I may end up using at some point.\nThe second issue I encountered was that using the GPU to run Lasagne’s enhanced example code on my full dataset was impossible due to memory constraints. The problem was that the example assumes that the entire dataset can fit in the GPU’s memory (as discussed here and here). This took a while to resolve, even though the solution is conceptually simple – just copy the dataset to the GPU in chunks rather than attempt to copy it all in one go. Resolving this issue was a good way of getting a better understanding of what the code does, since I ended up rewriting most of the original example code.\nNext steps So far, I left the network architecture from the original example mostly untouched, as I was busy collecting the dataset, getting the environment set up, and resolving various issues. One thing I did notice was that the example’s architecture diverges on my dataset, so instead I tested my code using a basic multi-layer perceptron architecture with a single hidden layer. This performs about as well as a random classifier on my dataset, but at least it converges. I also tested the modified code on the MNIST dataset and the results are decent, so now it is time to move forward and actually do some modelling, starting with convolutional neural nets.\nThe high level plan is to iteratively read tutorials/papers/books, implement ideas, play with parameters, and visualise parts of the network until I’m satisfied with the results. The main goal remains to learn as much as possible and get a good intuition of how things work. I’ll write more about my experiences in subsequent posts. Stay tuned!\nUpdate: The second post in the series is now available.\n","wordCount":"1311","inLanguage":"en","image":"https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre-shuffled.png","datePublished":"2015-06-06T05:00:22Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Hopping on the deep learning bandwagon</h1><div class=post-meta><span title='2015-06-06 05:00:22 +0000 UTC'>June 6, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre-shuffled_hu7dd57cb220c55d3023581cbc705ac82b_182096_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre-shuffled_hu7dd57cb220c55d3023581cbc705ac82b_182096_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre-shuffled_hu7dd57cb220c55d3023581cbc705ac82b_182096_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre-shuffled.png 748w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre-shuffled.png alt width=748 height=128></figure><div class=post-content><p>I&rsquo;ve been meaning to get into <a href=https://en.wikipedia.org/wiki/Deep_learning target=_blank rel=noopener>deep learning</a> for the last few years. Now, the stars having finally aligned and I have the time and motivation to work on a small project that will hopefully improve my understanding of the field. This is the first in a series of posts that will document my progress on this project.</p><p>As mentioned in a <a href=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/>previous post on getting started as a data scientist</a>, I believe that the best way of becoming proficient at solving data science problems is by getting your hands dirty. Despite being familiar with high-level terminology and having some understanding of how it all works, I don&rsquo;t have any practical experience applying deep learning. The purpose of this project is to fix this experience gap by working on a real problem.</p><h3 id=the-problem-inferring-genre-from-album-covers>The problem: Inferring genre from album covers<a hidden class=anchor aria-hidden=true href=#the-problem-inferring-genre-from-album-covers>#</a></h3><p>Deep learning has been very successful at image classification. Therefore, it makes sense to work on an image classification problem for this project. Rather than using an existing dataset, I decided to make things a bit more interesting by building my own dataset. Over the last year, I&rsquo;ve been running <a href=http://www.bcrecommender.com target=_blank rel=noopener>BCRecommender – a recommendation system for Bandcamp music</a>. I&rsquo;ve noticed that album covers vary by genre, though it&rsquo;s hard to quantify exactly <em>how</em> they vary. So the question I&rsquo;ll be trying to answer with this project is <em>how accurately can genre be inferred from Bandcamp album covers?</em></p><p>As the goal of this project is to learn about deep learning rather than make a novel contribution, I didn&rsquo;t do a comprehensive search to see whether this problem has been addressed before. However, I did find <a href=http://apassant.net/2015/05/14/album-covers-music-deep-learning/ target=_blank rel=noopener>a recent post by Alexandre Passant</a> that describes his use of Clarifai&rsquo;s API to tag the content of Spotify album covers (identifying elements such as men, night, dark, etc.), and then using these tags to infer the album&rsquo;s genre. Another related project is <a href=http://sergeykarayev.com/files/1311.3715v3.pdf target=_blank rel=noopener>Karayev et al.&rsquo;s <em>Recognizing image style</em> paper</a>, in which the authors classified datasets of images from Flickr and Wikipedia by style and art genre, respectively. In all these cases, the results are pretty good, supporting my intuition that the genre inference task is feasible.</p><h3 id=data-collection--splits>Data collection & splits<a hidden class=anchor aria-hidden=true href=#data-collection--splits>#</a></h3><p>As I&rsquo;ve already been crawling Bandcamp data for BCRecommender, creating the dataset was relatively straightforward. Currently, I have data on about 1.8 million tracks and albums. Bandcamp artists assign multiple tags to each release. To create the dataset, I selected 10 of the top tags: <em>ambient, dubstep, folk, hiphop_rap, jazz, metal, pop, punk, rock,</em> and <em>soul</em>. Then, I randomly selected 10,000 album covers that have exactly one of those tags, with 1,000 albums for each tag/genre. Each cover image size is 350×350. The following image shows a sample of the dataset.</p><figure><a href=bandcamp-album-covers-by-genre.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="Bandcamp,data science,deep learning,machine learning,predictive modelling"><meta name=description content="To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Hopping on the deep learning bandwagon"><meta property="og:description" content="To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/"><meta property="og:image" content="https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre-shuffled.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-06-06T05:00:22+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre-shuffled.png"><meta name=twitter:title content="Hopping on the deep learning bandwagon"><meta name=twitter:description content="To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Hopping on the deep learning bandwagon","item":"https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Hopping on the deep learning bandwagon","name":"Hopping on the deep learning bandwagon","description":"To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.","keywords":["Bandcamp","data science","deep learning","machine learning","predictive modelling"],"articleBody":"I’ve been meaning to get into deep learning for the last few years. Now, the stars having finally aligned and I have the time and motivation to work on a small project that will hopefully improve my understanding of the field. This is the first in a series of posts that will document my progress on this project.\nAs mentioned in a previous post on getting started as a data scientist, I believe that the best way of becoming proficient at solving data science problems is by getting your hands dirty. Despite being familiar with high-level terminology and having some understanding of how it all works, I don’t have any practical experience applying deep learning. The purpose of this project is to fix this experience gap by working on a real problem.\nThe problem: Inferring genre from album covers Deep learning has been very successful at image classification. Therefore, it makes sense to work on an image classification problem for this project. Rather than using an existing dataset, I decided to make things a bit more interesting by building my own dataset. Over the last year, I’ve been running BCRecommender – a recommendation system for Bandcamp music. I’ve noticed that album covers vary by genre, though it’s hard to quantify exactly how they vary. So the question I’ll be trying to answer with this project is how accurately can genre be inferred from Bandcamp album covers?\nAs the goal of this project is to learn about deep learning rather than make a novel contribution, I didn’t do a comprehensive search to see whether this problem has been addressed before. However, I did find a recent post by Alexandre Passant that describes his use of Clarifai’s API to tag the content of Spotify album covers (identifying elements such as men, night, dark, etc.), and then using these tags to infer the album’s genre. Another related project is Karayev et al.’s Recognizing image style paper, in which the authors classified datasets of images from Flickr and Wikipedia by style and art genre, respectively. In all these cases, the results are pretty good, supporting my intuition that the genre inference task is feasible.\nData collection \u0026 splits As I’ve already been crawling Bandcamp data for BCRecommender, creating the dataset was relatively straightforward. Currently, I have data on about 1.8 million tracks and albums. Bandcamp artists assign multiple tags to each release. To create the dataset, I selected 10 of the top tags: ambient, dubstep, folk, hiphop_rap, jazz, metal, pop, punk, rock, and soul. Then, I randomly selected 10,000 album covers that have exactly one of those tags, with 1,000 albums for each tag/genre. Each cover image size is 350×350. The following image shows a sample of the dataset.\nIt is apparent that some genres can be inferred more easily than others, especially when browsing through the full dataset. For example, metal albums tend to be pretty distinct. I doubt that predictive accuracy would be very high, but I think that it can definitely be much better than the random baseline of 10%.\nFor training, validation and testing I decided to use a static stratified 80%/10%/10% split of the dataset. It quickly became apparently that the full dataset is too big for development purposes, making it hard to quickly test code on my local machine. To address this, I created a local development dataset, using an 80%/10%/10% split of 1,000 images from the full training subset.\nThe code for downloading the dataset and creating the splits is available from the project repository on GitHub. This repository will include all the code for the project as it evolves. I will try to keep it well-documented enough to be useful for others, though it assumes some familiarity with Python. If you experience any issues running the code or find any bugs, please let me know.\nGetting started One of the things that has stopped me from playing with deep learning in the past is the feeling that there is a bit of a steep learning curve around the tools and methods. A lot of the deep learning libraries out there don’t seem as mature as general machine learning libraries, such as scikit-learn. There are also many more parameters to play with when building deep neural networks than when using linear models or algorithms such as random forests. Further, to enable any kind of meaningful experimentation, using a GPU is essential.\nFortunately, the tools and documentation have matured a lot in recent years. Motivated by Daniel Nouri’s excellent tutorial on detecting facial keypoints with convolutional neural nets, I decided to use the Lasagne package as my starting point. My plan was simple: Convert the MNIST example code to work on my dataset locally, setup an AWS machine with a GPU for full-scale experiments, and then play with various network architectures and techniques to improve accuracy and gain a deeper understanding of deep learning.\nInitial environment setup While Lasagne’s MNIST example code is pretty clear – especially once you get your head around the way Theano works – it doesn’t really lend itself to easy experimentation. I addressed this by refactoring the code in several iterations, until I got to the current state, where there’s a simple command-line interface that allows me to experiment with different datasets and architectures. This will probably change and become more complex as I start doing more sophisticated things.\nTo enable rapid experimentation, I had to set up an AWS machine with a GPU (g2.2xlarge instance). I wrote some simple deployment code using Fabric, which allows me to setup a machine from scratch, install all the requirements, package the project, and copy it to the remote machine.\nGetting the code running on the CPU was trivial, but I hit several issues when running on the GPU. First, the vanilla Ubuntu 14.04 server I used didn’t come with CUDA installed. After trying and failing to get it working by following some tutorials, I ended up going down the easier path of using the AMI supplied by Caffe. This AMI also has the advantage of coming with Caffe installed (surprisingly), which I may end up using at some point.\nThe second issue I encountered was that using the GPU to run Lasagne’s enhanced example code on my full dataset was impossible due to memory constraints. The problem was that the example assumes that the entire dataset can fit in the GPU’s memory (as discussed here and here). This took a while to resolve, even though the solution is conceptually simple – just copy the dataset to the GPU in chunks rather than attempt to copy it all in one go. Resolving this issue was a good way of getting a better understanding of what the code does, since I ended up rewriting most of the original example code.\nNext steps So far, I left the network architecture from the original example mostly untouched, as I was busy collecting the dataset, getting the environment set up, and resolving various issues. One thing I did notice was that the example’s architecture diverges on my dataset, so instead I tested my code using a basic multi-layer perceptron architecture with a single hidden layer. This performs about as well as a random classifier on my dataset, but at least it converges. I also tested the modified code on the MNIST dataset and the results are decent, so now it is time to move forward and actually do some modelling, starting with convolutional neural nets.\nThe high level plan is to iteratively read tutorials/papers/books, implement ideas, play with parameters, and visualise parts of the network until I’m satisfied with the results. The main goal remains to learn as much as possible and get a good intuition of how things work. I’ll write more about my experiences in subsequent posts. Stay tuned!\nUpdate: The second post in the series is now available.\n","wordCount":"1311","inLanguage":"en","image":"https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre-shuffled.png","datePublished":"2015-06-06T05:00:22Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Hopping on the deep learning bandwagon</h1><div class=post-meta><span title='2015-06-06 05:00:22 +0000 UTC'>June 6, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre-shuffled_hu7dd57cb220c55d3023581cbc705ac82b_182096_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre-shuffled_hu7dd57cb220c55d3023581cbc705ac82b_182096_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre-shuffled_hu7dd57cb220c55d3023581cbc705ac82b_182096_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre-shuffled.png 748w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre-shuffled.png alt width=748 height=128></figure><div class=post-content><p>I&rsquo;ve been meaning to get into <a href=https://en.wikipedia.org/wiki/Deep_learning target=_blank rel=noopener>deep learning</a> for the last few years. Now, the stars having finally aligned and I have the time and motivation to work on a small project that will hopefully improve my understanding of the field. This is the first in a series of posts that will document my progress on this project.</p><p>As mentioned in a <a href=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/>previous post on getting started as a data scientist</a>, I believe that the best way of becoming proficient at solving data science problems is by getting your hands dirty. Despite being familiar with high-level terminology and having some understanding of how it all works, I don&rsquo;t have any practical experience applying deep learning. The purpose of this project is to fix this experience gap by working on a real problem.</p><h3 id=the-problem-inferring-genre-from-album-covers>The problem: Inferring genre from album covers<a hidden class=anchor aria-hidden=true href=#the-problem-inferring-genre-from-album-covers>#</a></h3><p>Deep learning has been very successful at image classification. Therefore, it makes sense to work on an image classification problem for this project. Rather than using an existing dataset, I decided to make things a bit more interesting by building my own dataset. Over the last year, I&rsquo;ve been running <a href=http://www.bcrecommender.com target=_blank rel=noopener>BCRecommender – a recommendation system for Bandcamp music</a>. I&rsquo;ve noticed that album covers vary by genre, though it&rsquo;s hard to quantify exactly <em>how</em> they vary. So the question I&rsquo;ll be trying to answer with this project is <em>how accurately can genre be inferred from Bandcamp album covers?</em></p><p>As the goal of this project is to learn about deep learning rather than make a novel contribution, I didn&rsquo;t do a comprehensive search to see whether this problem has been addressed before. However, I did find <a href=http://apassant.net/2015/05/14/album-covers-music-deep-learning/ target=_blank rel=noopener>a recent post by Alexandre Passant</a> that describes his use of Clarifai&rsquo;s API to tag the content of Spotify album covers (identifying elements such as men, night, dark, etc.), and then using these tags to infer the album&rsquo;s genre. Another related project is <a href=http://sergeykarayev.com/files/1311.3715v3.pdf target=_blank rel=noopener>Karayev et al.&rsquo;s <em>Recognizing image style</em> paper</a>, in which the authors classified datasets of images from Flickr and Wikipedia by style and art genre, respectively. In all these cases, the results are pretty good, supporting my intuition that the genre inference task is feasible.</p><h3 id=data-collection--splits>Data collection & splits<a hidden class=anchor aria-hidden=true href=#data-collection--splits>#</a></h3><p>As I&rsquo;ve already been crawling Bandcamp data for BCRecommender, creating the dataset was relatively straightforward. Currently, I have data on about 1.8 million tracks and albums. Bandcamp artists assign multiple tags to each release. To create the dataset, I selected 10 of the top tags: <em>ambient, dubstep, folk, hiphop_rap, jazz, metal, pop, punk, rock,</em> and <em>soul</em>. Then, I randomly selected 10,000 album covers that have exactly one of those tags, with 1,000 albums for each tag/genre. Each cover image size is 350×350. The following image shows a sample of the dataset.</p><figure><a href=bandcamp-album-covers-by-genre.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre_hu98267f967d1b66bf7a519f3fa620b70e_1042875_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre_hu98267f967d1b66bf7a519f3fa620b70e_1042875_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/bandcamp-album-covers-by-genre_hu98267f967d1b66bf7a519f3fa620b70e_1042875_720x0_resize_box_3.png 720w,
diff --git a/2015/07/06/learning-about-deep-learning-through-album-cover-classification/index.html b/2015/07/06/learning-about-deep-learning-through-album-cover-classification/index.html
index 967ef5361..dc1c812b1 100644
--- a/2015/07/06/learning-about-deep-learning-through-album-cover-classification/index.html
+++ b/2015/07/06/learning-about-deep-learning-through-album-cover-classification/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Learning about deep learning through album cover classification | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data science,deep learning,machine learning,predictive modelling"><meta name=description content="Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Learning about deep learning through album cover classification"><meta property="og:description" content="Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/"><meta property="og:image" content="https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-07-06T22:21:42+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre.png"><meta name=twitter:title content="Learning about deep learning through album cover classification"><meta name=twitter:description content="Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Learning about deep learning through album cover classification","item":"https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Learning about deep learning through album cover classification","name":"Learning about deep learning through album cover classification","description":"Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.","keywords":["data science","deep learning","machine learning","predictive modelling"],"articleBody":"In the past month, I’ve spent some time on my album cover classification project. The goal of this project is for me to learn about deep learning by working on an actual problem. This post covers my progress so far, highlighting lessons that would be useful to others who are getting started with deep learning.\nInitial steps summary The following points were discussed in detail in the previous post on this project.\nThe problem I chose to work on is classifying Bandcamp album covers by genre, using a balanced dataset of 10,000 images from 10 different genres. The experimental code is based on Lasagne, and is available on GitHub. Having set up the environment for running experiments on a GPU, the plan was to get Lasagne’s examples working on my dataset, and then iteratively read tutorials/papers/books, implement ideas, play with parameters, and visualise parts of the network until I’m satisfied with the results. Preliminary experiments and learning resources I hit several issues when adapting Lasagne’s example code to my dataset. The key issue is that the example code is based on the MNIST digits dataset. That dataset’s images are 28×28 grayscale, and my dataset’s images are 350×350 RGB. This difference led to the training loss quickly diverging when running the example code without any changes. It turns out that simply lowering the learning rate resolves this issue, though the initial results I got were still not much better than random. In general, it appears that everything works on the MNIST digits dataset, so choosing to work on my own dataset made things more challenging (which is a good thing).\nThe main learning resource I used is the excellent notes for the Stanford course Convolutional Neural Networks for Visual Recognition. The notes are very clear, contain up-to-date information from recent publications, and include many practical tips for successful training of convolutional networks (convnets). In addition, I read some other tutorials and a few papers. These are summarised in a separate page.\nThe first step after getting the MNIST examples working on my dataset was to extend the code to enable more flexible architectures. My main focus was on vanilla convnets, i.e., networks with several convolutional layers, where each convolutional layer is optionally followed by a max-pooling layer, and the convolutional layers are followed by multiple dense/fully-connected layers and dropout layers. To allow for easy experimentation, the specification of the network can be done from the command line. For example, to train an AlexNet architecture:\n$ python manage.py run_experiment \\ --dataset-path /path/to/dataset \\ --model-architecture ConvNet \\ --model-params num_conv_layers=5:num_dense_layers=2:lc0_num_filters=48:lc0_filter_size=11:lc0_stride=4:lc0_mp=True:lm0_pool_size=3:lm0_stride=2:lc1_num_filters=128:lc1_filter_size=5:lc1_mp=True:lm1_pool_size=3:lm1_stride=2:lc2_num_filters=192:lc2_filter_size=3:lc3_num_filters=192:lc3_filter_size=3:lc4_num_filters=128:lc4_filter_size=3:lc4_mp=True:lm4_pool_size=3:lm4_stride=2:ld0_num_units=2048:ld1_num_units=2048 This can obviously be a bit of a mouthful, so common architectures are also defined in the code with parameters that can be overridden. For instance, to train an AlexNet with 64 filters in the first layer instead of 48:\n$ python manage.py run_experiment \\ --dataset-path /path/to/dataset \\ --model-architecture AlexNet \\ --model-params lc0_num_filters=64 There are many more command line flags (possibly too many), which make it easy to both tinker with various settings, and also run more rigorous experiments. My initial tinkering with convnets didn’t yield impressive results in terms of predictive accuracy on my dataset. It turned out that this was partly due to the lack of preprocessing – the less exciting but crucial part of any predictive modelling work.\nThe importance of preprocessing My initial focus was on getting things to work on the dataset without worrying too much about preprocessing. I haven’t done any image classification work in the past, so I had to learn about the right type of preprocessing to use. I kept it pretty simple and applied the following transformations:\nDownsampling: all images were scaled down to 256×256. I played briefly with other sizes, but decided on this size to make it easy to use models pretrained on ImageNet. Cropping \u0026 mirroring: during training time, each image was cropped to random 224×224 slices. Deterministic slices were used in test time. In addition, each crop was mirrored horizontally. In most cases I used ten overall crops. Again, these numbers were chosen for comparability with ImageNet-trained models. Mean subtraction: the training mean of each pixel was subtracted from each instance. Shuffling: probably the most important preprocessing step. Initially I had the instances sorted by their class, as an artifact of the way the dataset was constructed. Due to the relatively small number of instances the network sees in each batch, this meant that in each epoch, the network first fitted on all the instances from class 1, then all the instances from class 2, etc. This led to very poor performance, which was fixed by shuffling the data once at the start of the training procedure (shuffling every epoch could potentially make things even better). Baselines After building the experimental environment and a fair bit of tinkering, I decided it was time for some more serious experiments. The results of my initial games were rather disappointing – slightly better than a random baseline, which yields an accuracy score of 10%. Therefore, I ran some baselines to get an idea of what’s possible on this dataset.\nThe first baseline I tried was a random forest with 1,000 trees, which yielded 15.25% accuracy. This baseline was trained directly on the pixel values without any preprocessing other than downsampling. It’s worth noting that the downsampling size didn’t make much of a difference to this baseline (I tried a few values in the range 50×50-350×350). This baseline was also not particularly sensitive to whether RGB or grayscale values were used to represent the images.\nThe next experiments were with baselines that utilised pretrained Caffe models. Training a random forest with 1,000 trees on features extracted from the highest fully-connected layer (fc7) in the CaffeNet and VGGNet-19 models yielded accuracies of 16.72% and 16.40% respectively. This was pretty disappointing, as I expected these features to perform much better. The reason may be that album covers are very different from ImageNet images, and the representations in fc7 are too specific to ImageNet. Indeed, when fine-tuning the CaffeNet model (following the procedure outlined here), I got the best accuracy on the dataset: 22.60%. Using Caffe to train the same network from scratch didn’t even get close to this accuracy. However, I didn’t try to tune Caffe’s learning parameters. Instead, I went back to running experiments with my code.\nIt’s worth noting that the classes identified by the CaffeNet model often have little to do with the actual content of the image. Better baseline results may be obtained by using models that were pretrained on a richer dataset than ImageNet. The following table presents three example covers together with the top-five classes identified by the CaffeNet model for each image. The tags assigned by Clarifai’s API are also presented for comparison. From this example, it looks like Clarifai’s model is more successful at identifying the correct elements than the CaffeNet model, indicating that a baseline that uses the Clarifai tags may yield competitive performance.\nAlbum CaffeNet Clarifai October by Wille P\nhiphop_rap digital clock, spotlight, jack-o’-lantern, volcano, traffic light tree, landscape, sunset, desert, sun, sunrise, nature, evening, sky, travel Demo by Blackrat\nmetal spider web, barn spider, chain, bubble, fountain skull, bone, nobody, death, vector, help, horror, medicine, black and white, tattoo The Kool-Aid Album by Mr. Merge\nsoul dishrag, paper towel, honeycomb, envelope, chain mail symbol, nobody, sign, illustration, color, flag, text, stripes, business, character Training from scratch My initial experiments were with various convnet architectures, where I manually varied the filter sizes and number of layers to have a reasonable number of parameters and ensure that the model is trainable on a GPU with 4GB of memory. As mentioned, this approach yielded unimpressive results. Following the relative success of the fine-tuned CaffeNet baseline, I decided to run more rigorous experiments on variants of AlexNet (which is very similar to CaffeNet).\nGiven the large number of hyperparameters that need to be set when training deep convnets, I realised that setting values manually or via grid search is unlikely to yield the best results. To address this, I used hyperopt to search for the best configuration of values. The hyperparameters that were included in the search were the learning method (Nesterov momentum versus Adam with their respective parameters), the learning rate, whether crops are mirrored or not, the number of crops to use (1 or 5), dropout probabilities, the number of hidden units in the fully-connected layers, and the number of filters in each convolutional layer.\nEach configuration suggested by hyperopt was trained for 10 epochs, and the promising setups were trained until results stopped improving. The results of the search were rather disappointing, with the best accuracy being 17.19%. However, I learned a lot by finding hyperparameters in this manner – in the past I’ve only used a combination of manual settings with grid search.\nThere are many possible reasons for why the results are so poor. It could be that there’s just too little data to train a good classifier, which is supported by the inability to beat the fine-tuned results. This is in line with the results obtained by Zeiler and Fergus (2013), who found that convnets pretrained on ImageNet performed much better on the Caltech-101 and Caltech-256 datasets than the same networks trained from scratch. However, it could also be that I just didn’t run enough experiments – I definitely feel like I haven’t explored everything as well as I’d like. In addition, I’m still building my intuition for what works and why. I should work more on visualising the way the network learns to uncover more hidden gotchas in addition to those I’ve already found. Finally, it could be that it’s just too hard to distinguish between covers from the genres I chose for the study.\nIdeas for future work There are many avenues for improving on the work I’ve done so far. The code could definitely be made more robust and better tested, optimised and parallelised. It would be worth investing more in hyperparameter and architecture search, including incorporation of ideas from non-vanilla convnets (e.g., GoogLeNet). This search should be guided by visualisation and a deeper understanding of the trained networks, which may also come from analysing class-level accuracy (certain genres seem to be easier to distinguish than others). In addition, more sophisticated preprocessing may yield improved results.\nIf the goal were to get the best possible performance on my dataset, I’d invest in establishing the human performance baseline on the dataset by running some tests with Mechanical Turk. My guess is that humans would perform better than the algorithms tested so far due to access to external knowledge. Therefore, incorporating external knowledge in the form of manual features or additional data sources may yield the most substantial performance boosts. For example, text on an album cover may contain important clues about its genre, and models pretrained on style datasets may be more suitable than ImageNet models. In addition, it may be beneficial to use a model to detect multiple elements in images where the universe is not restricted to ImageNet classes. This approach was taken by Alexandre Passant, who used Clarifai’s API to tag and classify doom metal and K-pop album covers. Finally, using several different models in an ensemble is likely to help squeeze a bit more accuracy out of the dataset.\nAnother direction that may be worth exploring is using image data for recommendation work. The reason I chose to work on this problem was my exposure to album covers through my work on Bandcamp Recommender – a music recommendation system. It is well-known that visual elements influence the way users interact with recommender systems. This is especially true in Bandcamp Recommender’s case, as users see the album covers before they choose to play them. This leads me to conjecture that considering features that describe the album covers when generating recommendations would increase user interaction with the system. However, it’s hard to tell whether it’d increase the overall relevance of the results. You can’t judge an album by its cover. Or can you…?\nConclusion While I’ve learned a lot from working on this project, there’s still much more to discover. It was especially great to learn some generally-applicable lessons about hyperparameter optimisation and improvements to vanilla gradient descent. Despite the many potential ways of improving performance on my dataset, my next steps in the field would probably include working on problems for which obtaining a good solution is feasible and useful. For example, I have some ideas for applications to marine creature identification.\nFeedback and suggestions are always welcome. Please feel free to contact me privately or via the comments section.\nAcknowledgement: Thanks to Brian Basham and Diogo Moitinho de Almeida for useful tips and discussions.\n","wordCount":"2117","inLanguage":"en","image":"https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre.png","datePublished":"2015-07-06T22:21:42Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Learning about deep learning through album cover classification</h1><div class=post-meta><span title='2015-07-06 22:21:42 +0000 UTC'>July 6, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre_hube220240ac6ea6d528d49262fd2fcb98_1398155_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre_hube220240ac6ea6d528d49262fd2fcb98_1398155_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre_hube220240ac6ea6d528d49262fd2fcb98_1398155_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre_hube220240ac6ea6d528d49262fd2fcb98_1398155_1080x0_resize_box_3.png 1080w ,https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre.png 1259w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre.png alt width=1259 height=649></figure><div class=post-content><p>In the past month, I&rsquo;ve spent some time on <a href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/>my album cover classification project</a>. The goal of this project is for me to learn about deep learning by working on an actual problem. This post covers my progress so far, highlighting lessons that would be useful to others who are getting started with deep learning.</p><h3 id=initial-steps-summary>Initial steps summary<a hidden class=anchor aria-hidden=true href=#initial-steps-summary>#</a></h3><p>The following points were discussed in detail in the <a href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/>previous post on this project</a>.</p><ul><li>The problem I chose to work on is classifying Bandcamp album covers by genre, using a balanced dataset of 10,000 images from 10 different genres.</li><li>The experimental code is based on <a href=http://lasagne.readthedocs.org/en/latest/ target=_blank rel=noopener>Lasagne</a>, and is <a href=https://github.com/yanirs/bandcamp-deep-learning/ target=_blank rel=noopener>available on GitHub</a>.</li><li>Having set up the environment for running experiments on a GPU, the plan was to get Lasagne&rsquo;s examples working on my dataset, and then iteratively read tutorials/papers/books, implement ideas, play with parameters, and visualise parts of the network until I&rsquo;m satisfied with the results.</li></ul><h3 id=preliminary-experiments-and-learning-resources>Preliminary experiments and learning resources<a hidden class=anchor aria-hidden=true href=#preliminary-experiments-and-learning-resources>#</a></h3><p>I hit several issues when adapting Lasagne&rsquo;s example code to my dataset. The key issue is that the example code is based on the MNIST digits dataset. That dataset&rsquo;s images are 28×28 grayscale, and my dataset&rsquo;s images are 350×350 RGB. This difference led to the training loss quickly diverging when running the example code without any changes. It turns out that simply lowering the learning rate resolves this issue, though the initial results I got were still not much better than random. In general, it appears that everything works on the MNIST digits dataset, so choosing to work on my own dataset made things more challenging (which is a good thing).</p><p>The main learning resource I used is the excellent notes for the Stanford course <a href=http://cs231n.github.io/ target=_blank rel=noopener>Convolutional Neural Networks for Visual Recognition</a>. The notes are very clear, contain up-to-date information from recent publications, and include many practical tips for successful training of convolutional networks (convnets). In addition, I read some other tutorials and a few papers. These are summarised in <a href=https://yanirseroussi.com/deep-learning-resources/>a separate page</a>.</p><p>The first step after getting the MNIST examples working on my dataset was to extend the code to enable more flexible architectures. My main focus was on vanilla convnets, i.e., networks with several convolutional layers, where each convolutional layer is optionally followed by a max-pooling layer, and the convolutional layers are followed by multiple dense/fully-connected layers and dropout layers. To allow for easy experimentation, the specification of the network can be done from the command line. For example, to train an <a href=http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf target=_blank rel=noopener>AlexNet</a> architecture:</p><div class=highlight><pre tabindex=0 style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-bash data-lang=bash><span style=display:flex><span>$ python manage.py run_experiment <span style=color:#ae81ff>\
+<meta name=keywords content="data science,deep learning,machine learning,predictive modelling"><meta name=description content="Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Learning about deep learning through album cover classification"><meta property="og:description" content="Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/"><meta property="og:image" content="https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-07-06T22:21:42+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre.png"><meta name=twitter:title content="Learning about deep learning through album cover classification"><meta name=twitter:description content="Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Learning about deep learning through album cover classification","item":"https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Learning about deep learning through album cover classification","name":"Learning about deep learning through album cover classification","description":"Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.","keywords":["data science","deep learning","machine learning","predictive modelling"],"articleBody":"In the past month, I’ve spent some time on my album cover classification project. The goal of this project is for me to learn about deep learning by working on an actual problem. This post covers my progress so far, highlighting lessons that would be useful to others who are getting started with deep learning.\nInitial steps summary The following points were discussed in detail in the previous post on this project.\nThe problem I chose to work on is classifying Bandcamp album covers by genre, using a balanced dataset of 10,000 images from 10 different genres. The experimental code is based on Lasagne, and is available on GitHub. Having set up the environment for running experiments on a GPU, the plan was to get Lasagne’s examples working on my dataset, and then iteratively read tutorials/papers/books, implement ideas, play with parameters, and visualise parts of the network until I’m satisfied with the results. Preliminary experiments and learning resources I hit several issues when adapting Lasagne’s example code to my dataset. The key issue is that the example code is based on the MNIST digits dataset. That dataset’s images are 28×28 grayscale, and my dataset’s images are 350×350 RGB. This difference led to the training loss quickly diverging when running the example code without any changes. It turns out that simply lowering the learning rate resolves this issue, though the initial results I got were still not much better than random. In general, it appears that everything works on the MNIST digits dataset, so choosing to work on my own dataset made things more challenging (which is a good thing).\nThe main learning resource I used is the excellent notes for the Stanford course Convolutional Neural Networks for Visual Recognition. The notes are very clear, contain up-to-date information from recent publications, and include many practical tips for successful training of convolutional networks (convnets). In addition, I read some other tutorials and a few papers. These are summarised in a separate page.\nThe first step after getting the MNIST examples working on my dataset was to extend the code to enable more flexible architectures. My main focus was on vanilla convnets, i.e., networks with several convolutional layers, where each convolutional layer is optionally followed by a max-pooling layer, and the convolutional layers are followed by multiple dense/fully-connected layers and dropout layers. To allow for easy experimentation, the specification of the network can be done from the command line. For example, to train an AlexNet architecture:\n$ python manage.py run_experiment \\ --dataset-path /path/to/dataset \\ --model-architecture ConvNet \\ --model-params num_conv_layers=5:num_dense_layers=2:lc0_num_filters=48:lc0_filter_size=11:lc0_stride=4:lc0_mp=True:lm0_pool_size=3:lm0_stride=2:lc1_num_filters=128:lc1_filter_size=5:lc1_mp=True:lm1_pool_size=3:lm1_stride=2:lc2_num_filters=192:lc2_filter_size=3:lc3_num_filters=192:lc3_filter_size=3:lc4_num_filters=128:lc4_filter_size=3:lc4_mp=True:lm4_pool_size=3:lm4_stride=2:ld0_num_units=2048:ld1_num_units=2048 This can obviously be a bit of a mouthful, so common architectures are also defined in the code with parameters that can be overridden. For instance, to train an AlexNet with 64 filters in the first layer instead of 48:\n$ python manage.py run_experiment \\ --dataset-path /path/to/dataset \\ --model-architecture AlexNet \\ --model-params lc0_num_filters=64 There are many more command line flags (possibly too many), which make it easy to both tinker with various settings, and also run more rigorous experiments. My initial tinkering with convnets didn’t yield impressive results in terms of predictive accuracy on my dataset. It turned out that this was partly due to the lack of preprocessing – the less exciting but crucial part of any predictive modelling work.\nThe importance of preprocessing My initial focus was on getting things to work on the dataset without worrying too much about preprocessing. I haven’t done any image classification work in the past, so I had to learn about the right type of preprocessing to use. I kept it pretty simple and applied the following transformations:\nDownsampling: all images were scaled down to 256×256. I played briefly with other sizes, but decided on this size to make it easy to use models pretrained on ImageNet. Cropping \u0026 mirroring: during training time, each image was cropped to random 224×224 slices. Deterministic slices were used in test time. In addition, each crop was mirrored horizontally. In most cases I used ten overall crops. Again, these numbers were chosen for comparability with ImageNet-trained models. Mean subtraction: the training mean of each pixel was subtracted from each instance. Shuffling: probably the most important preprocessing step. Initially I had the instances sorted by their class, as an artifact of the way the dataset was constructed. Due to the relatively small number of instances the network sees in each batch, this meant that in each epoch, the network first fitted on all the instances from class 1, then all the instances from class 2, etc. This led to very poor performance, which was fixed by shuffling the data once at the start of the training procedure (shuffling every epoch could potentially make things even better). Baselines After building the experimental environment and a fair bit of tinkering, I decided it was time for some more serious experiments. The results of my initial games were rather disappointing – slightly better than a random baseline, which yields an accuracy score of 10%. Therefore, I ran some baselines to get an idea of what’s possible on this dataset.\nThe first baseline I tried was a random forest with 1,000 trees, which yielded 15.25% accuracy. This baseline was trained directly on the pixel values without any preprocessing other than downsampling. It’s worth noting that the downsampling size didn’t make much of a difference to this baseline (I tried a few values in the range 50×50-350×350). This baseline was also not particularly sensitive to whether RGB or grayscale values were used to represent the images.\nThe next experiments were with baselines that utilised pretrained Caffe models. Training a random forest with 1,000 trees on features extracted from the highest fully-connected layer (fc7) in the CaffeNet and VGGNet-19 models yielded accuracies of 16.72% and 16.40% respectively. This was pretty disappointing, as I expected these features to perform much better. The reason may be that album covers are very different from ImageNet images, and the representations in fc7 are too specific to ImageNet. Indeed, when fine-tuning the CaffeNet model (following the procedure outlined here), I got the best accuracy on the dataset: 22.60%. Using Caffe to train the same network from scratch didn’t even get close to this accuracy. However, I didn’t try to tune Caffe’s learning parameters. Instead, I went back to running experiments with my code.\nIt’s worth noting that the classes identified by the CaffeNet model often have little to do with the actual content of the image. Better baseline results may be obtained by using models that were pretrained on a richer dataset than ImageNet. The following table presents three example covers together with the top-five classes identified by the CaffeNet model for each image. The tags assigned by Clarifai’s API are also presented for comparison. From this example, it looks like Clarifai’s model is more successful at identifying the correct elements than the CaffeNet model, indicating that a baseline that uses the Clarifai tags may yield competitive performance.\nAlbum CaffeNet Clarifai October by Wille P\nhiphop_rap digital clock, spotlight, jack-o’-lantern, volcano, traffic light tree, landscape, sunset, desert, sun, sunrise, nature, evening, sky, travel Demo by Blackrat\nmetal spider web, barn spider, chain, bubble, fountain skull, bone, nobody, death, vector, help, horror, medicine, black and white, tattoo The Kool-Aid Album by Mr. Merge\nsoul dishrag, paper towel, honeycomb, envelope, chain mail symbol, nobody, sign, illustration, color, flag, text, stripes, business, character Training from scratch My initial experiments were with various convnet architectures, where I manually varied the filter sizes and number of layers to have a reasonable number of parameters and ensure that the model is trainable on a GPU with 4GB of memory. As mentioned, this approach yielded unimpressive results. Following the relative success of the fine-tuned CaffeNet baseline, I decided to run more rigorous experiments on variants of AlexNet (which is very similar to CaffeNet).\nGiven the large number of hyperparameters that need to be set when training deep convnets, I realised that setting values manually or via grid search is unlikely to yield the best results. To address this, I used hyperopt to search for the best configuration of values. The hyperparameters that were included in the search were the learning method (Nesterov momentum versus Adam with their respective parameters), the learning rate, whether crops are mirrored or not, the number of crops to use (1 or 5), dropout probabilities, the number of hidden units in the fully-connected layers, and the number of filters in each convolutional layer.\nEach configuration suggested by hyperopt was trained for 10 epochs, and the promising setups were trained until results stopped improving. The results of the search were rather disappointing, with the best accuracy being 17.19%. However, I learned a lot by finding hyperparameters in this manner – in the past I’ve only used a combination of manual settings with grid search.\nThere are many possible reasons for why the results are so poor. It could be that there’s just too little data to train a good classifier, which is supported by the inability to beat the fine-tuned results. This is in line with the results obtained by Zeiler and Fergus (2013), who found that convnets pretrained on ImageNet performed much better on the Caltech-101 and Caltech-256 datasets than the same networks trained from scratch. However, it could also be that I just didn’t run enough experiments – I definitely feel like I haven’t explored everything as well as I’d like. In addition, I’m still building my intuition for what works and why. I should work more on visualising the way the network learns to uncover more hidden gotchas in addition to those I’ve already found. Finally, it could be that it’s just too hard to distinguish between covers from the genres I chose for the study.\nIdeas for future work There are many avenues for improving on the work I’ve done so far. The code could definitely be made more robust and better tested, optimised and parallelised. It would be worth investing more in hyperparameter and architecture search, including incorporation of ideas from non-vanilla convnets (e.g., GoogLeNet). This search should be guided by visualisation and a deeper understanding of the trained networks, which may also come from analysing class-level accuracy (certain genres seem to be easier to distinguish than others). In addition, more sophisticated preprocessing may yield improved results.\nIf the goal were to get the best possible performance on my dataset, I’d invest in establishing the human performance baseline on the dataset by running some tests with Mechanical Turk. My guess is that humans would perform better than the algorithms tested so far due to access to external knowledge. Therefore, incorporating external knowledge in the form of manual features or additional data sources may yield the most substantial performance boosts. For example, text on an album cover may contain important clues about its genre, and models pretrained on style datasets may be more suitable than ImageNet models. In addition, it may be beneficial to use a model to detect multiple elements in images where the universe is not restricted to ImageNet classes. This approach was taken by Alexandre Passant, who used Clarifai’s API to tag and classify doom metal and K-pop album covers. Finally, using several different models in an ensemble is likely to help squeeze a bit more accuracy out of the dataset.\nAnother direction that may be worth exploring is using image data for recommendation work. The reason I chose to work on this problem was my exposure to album covers through my work on Bandcamp Recommender – a music recommendation system. It is well-known that visual elements influence the way users interact with recommender systems. This is especially true in Bandcamp Recommender’s case, as users see the album covers before they choose to play them. This leads me to conjecture that considering features that describe the album covers when generating recommendations would increase user interaction with the system. However, it’s hard to tell whether it’d increase the overall relevance of the results. You can’t judge an album by its cover. Or can you…?\nConclusion While I’ve learned a lot from working on this project, there’s still much more to discover. It was especially great to learn some generally-applicable lessons about hyperparameter optimisation and improvements to vanilla gradient descent. Despite the many potential ways of improving performance on my dataset, my next steps in the field would probably include working on problems for which obtaining a good solution is feasible and useful. For example, I have some ideas for applications to marine creature identification.\nFeedback and suggestions are always welcome. Please feel free to contact me privately or via the comments section.\nAcknowledgement: Thanks to Brian Basham and Diogo Moitinho de Almeida for useful tips and discussions.\n","wordCount":"2117","inLanguage":"en","image":"https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre.png","datePublished":"2015-07-06T22:21:42Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Learning about deep learning through album cover classification</h1><div class=post-meta><span title='2015-07-06 22:21:42 +0000 UTC'>July 6, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre_hube220240ac6ea6d528d49262fd2fcb98_1398155_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre_hube220240ac6ea6d528d49262fd2fcb98_1398155_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre_hube220240ac6ea6d528d49262fd2fcb98_1398155_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre_hube220240ac6ea6d528d49262fd2fcb98_1398155_1080x0_resize_box_3.png 1080w ,https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre.png 1259w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/bandcamp-album-covers-by-genre.png alt width=1259 height=649></figure><div class=post-content><p>In the past month, I&rsquo;ve spent some time on <a href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/>my album cover classification project</a>. The goal of this project is for me to learn about deep learning by working on an actual problem. This post covers my progress so far, highlighting lessons that would be useful to others who are getting started with deep learning.</p><h3 id=initial-steps-summary>Initial steps summary<a hidden class=anchor aria-hidden=true href=#initial-steps-summary>#</a></h3><p>The following points were discussed in detail in the <a href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/>previous post on this project</a>.</p><ul><li>The problem I chose to work on is classifying Bandcamp album covers by genre, using a balanced dataset of 10,000 images from 10 different genres.</li><li>The experimental code is based on <a href=http://lasagne.readthedocs.org/en/latest/ target=_blank rel=noopener>Lasagne</a>, and is <a href=https://github.com/yanirs/bandcamp-deep-learning/ target=_blank rel=noopener>available on GitHub</a>.</li><li>Having set up the environment for running experiments on a GPU, the plan was to get Lasagne&rsquo;s examples working on my dataset, and then iteratively read tutorials/papers/books, implement ideas, play with parameters, and visualise parts of the network until I&rsquo;m satisfied with the results.</li></ul><h3 id=preliminary-experiments-and-learning-resources>Preliminary experiments and learning resources<a hidden class=anchor aria-hidden=true href=#preliminary-experiments-and-learning-resources>#</a></h3><p>I hit several issues when adapting Lasagne&rsquo;s example code to my dataset. The key issue is that the example code is based on the MNIST digits dataset. That dataset&rsquo;s images are 28×28 grayscale, and my dataset&rsquo;s images are 350×350 RGB. This difference led to the training loss quickly diverging when running the example code without any changes. It turns out that simply lowering the learning rate resolves this issue, though the initial results I got were still not much better than random. In general, it appears that everything works on the MNIST digits dataset, so choosing to work on my own dataset made things more challenging (which is a good thing).</p><p>The main learning resource I used is the excellent notes for the Stanford course <a href=http://cs231n.github.io/ target=_blank rel=noopener>Convolutional Neural Networks for Visual Recognition</a>. The notes are very clear, contain up-to-date information from recent publications, and include many practical tips for successful training of convolutional networks (convnets). In addition, I read some other tutorials and a few papers. These are summarised in <a href=https://yanirseroussi.com/deep-learning-resources/>a separate page</a>.</p><p>The first step after getting the MNIST examples working on my dataset was to extend the code to enable more flexible architectures. My main focus was on vanilla convnets, i.e., networks with several convolutional layers, where each convolutional layer is optionally followed by a max-pooling layer, and the convolutional layers are followed by multiple dense/fully-connected layers and dropout layers. To allow for easy experimentation, the specification of the network can be done from the command line. For example, to train an <a href=http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf target=_blank rel=noopener>AlexNet</a> architecture:</p><div class=highlight><pre tabindex=0 style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-bash data-lang=bash><span style=display:flex><span>$ python manage.py run_experiment <span style=color:#ae81ff>\
 </span></span></span><span style=display:flex><span><span style=color:#ae81ff></span>    --dataset-path /path/to/dataset <span style=color:#ae81ff>\
 </span></span></span><span style=display:flex><span><span style=color:#ae81ff></span>    --model-architecture ConvNet <span style=color:#ae81ff>\
 </span></span></span><span style=display:flex><span><span style=color:#ae81ff></span>    --model-params num_conv_layers<span style=color:#f92672>=</span>5:num_dense_layers<span style=color:#f92672>=</span>2:lc0_num_filters<span style=color:#f92672>=</span>48:lc0_filter_size<span style=color:#f92672>=</span>11:lc0_stride<span style=color:#f92672>=</span>4:lc0_mp<span style=color:#f92672>=</span>True:lm0_pool_size<span style=color:#f92672>=</span>3:lm0_stride<span style=color:#f92672>=</span>2:lc1_num_filters<span style=color:#f92672>=</span>128:lc1_filter_size<span style=color:#f92672>=</span>5:lc1_mp<span style=color:#f92672>=</span>True:lm1_pool_size<span style=color:#f92672>=</span>3:lm1_stride<span style=color:#f92672>=</span>2:lc2_num_filters<span style=color:#f92672>=</span>192:lc2_filter_size<span style=color:#f92672>=</span>3:lc3_num_filters<span style=color:#f92672>=</span>192:lc3_filter_size<span style=color:#f92672>=</span>3:lc4_num_filters<span style=color:#f92672>=</span>128:lc4_filter_size<span style=color:#f92672>=</span>3:lc4_mp<span style=color:#f92672>=</span>True:lm4_pool_size<span style=color:#f92672>=</span>3:lm4_stride<span style=color:#f92672>=</span>2:ld0_num_units<span style=color:#f92672>=</span>2048:ld1_num_units<span style=color:#f92672>=</span><span style=color:#ae81ff>2048</span>
diff --git a/2015/07/31/goodbye-parse-com/index.html b/2015/07/31/goodbye-parse-com/index.html
index eb0377c31..1c5ddacb0 100644
--- a/2015/07/31/goodbye-parse-com/index.html
+++ b/2015/07/31/goodbye-parse-com/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Goodbye, Parse.com | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="BCRecommender,DevOps,parse.com,software engineering"><meta name=description content="Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/07/31/goodbye-parse-com/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/07/31/goodbye-parse-com/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Goodbye, Parse.com"><meta property="og:description" content="Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/07/31/goodbye-parse-com/"><meta property="og:image" content="https://yanirseroussi.com/2015/07/31/goodbye-parse-com/farewell.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-07-31T03:29:50+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/07/31/goodbye-parse-com/farewell.jpg"><meta name=twitter:title content="Goodbye, Parse.com"><meta name=twitter:description content="Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Goodbye, Parse.com","item":"https://yanirseroussi.com/2015/07/31/goodbye-parse-com/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Goodbye, Parse.com","name":"Goodbye, Parse.com","description":"Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution.","keywords":["BCRecommender","DevOps","parse.com","software engineering"],"articleBody":"Over the past year, I’ve been using Parse‘s free backend-as-a-service and web hosting to serve BCRecommender (music recommendation service) and Price Dingo (now-closed shopping comparison engine). The main lesson: You get what you pay for. Despite some improvements, Parse remains very unreliable, and any time saved by using their APIs and SDKs tends to be offset by having to work around the restrictions of their sandboxed environment. This post details some of the issues I faced and the transition away from the service.\nWhat’s so bad about Parse? In one word: reliability. The service is simply unreliable, with many latency spikes and random errors. I reported this issue six months ago, and it’s still being investigated. Reliability has been a known issue for years (see Stack Overflow and Hacker News discussions). Parse’s acquisition by Facebook over two years ago gave some hope that these issues would be resolved quickly, but this is just not the case.\nIt is worth noting that the way I used Parse was probably somewhat uncommon. For both Price Dingo and BCRecommender, data was scraped and processed outside Parse, and then imported in bulk into Parse. As bulk imports are not supported by the API, automating the process required reliance on the web interface, which made things somewhat fragile. Further, a few months ago Parse inexplicably dropped support for uploading zipped files, making imports much slower. Finally, when importing large collections, I found that it takes ages for the data to get indexed. The final straw was with the last BCRecommender update, where even after days of waiting the data was still not fully indexed.\nPrice Dingo’s transition Price Dingo was a shopping comparison engine with a web interface. The idea was to focus on user needs in specialised product categories, as opposed to the traditional model that requires merchants to pay to be listed. I decided to shut down the service a few months ago to focus on other things, but before the shutdown, I almost completed the transition away from Parse. The first step was replacing the persistence layer with Algolia – search engine as a service. Algolia is super-fast, its advanced search capabilities are way better than Parse’s search options, and as a paid service their customer support was excellent. If I hadn’t shut Price Dingo down, the second step would have been replacing Parse hosting with a more reliable service, as I have recently done for BCRecommender.\nBCRecommender’s transition The Parse-hosted part of BCRecommender was a fairly simple express.js backend that rendered Jade templates. The fastest transition would probably have been to set up a standalone express.js backend and replace the Parse API calls with calls to the database. But as I much prefer coding in Python (the recommendation-generating backend is in Python), I decided to completely rewrite the web backend using Flask.\nFor hosting, I decided to go with DigitalOcean (signing up with this link gives you US$10 credit), because it has a good reputation, and it compares favourably with other infrastructure-as-a-service providers. For US$10/month you get a server with 1GB of memory, 30GB of SSD storage, and 2TB of data transfers, which should be more than enough for BCRecommender’s modest traffic (200 daily users + ~2 bot requests per second).\nSetting up the BCRecommender webapp stack is a bit more involved than getting started with Parse, but fortunately I was already familiar with all parts of the stack. It ended up being almost identical to the stack used in Charlie Huang’s blog post Deploy a MongoDB powered Flask app in 5 minutes: an Ubuntu server running MongoDB as the persistence layer, Nginx as the webserver, Gunicorn as the WSGI proxy, Supervisor for daemon management, and Fabric for managing deployments.\nBefore deploying to DigitalOcean, I used Vagrant to set up a local development environment, which is almost identical to the production environment. Deployment scripts are one thing that you don’t have to worry about when using Parse, as they provide their own build tools. However, it’s not too hard to implement your own scripts, so within a few hours I had the environment and the deployment scripts up and ready for translating the webapp code from express.js to Flask.\nThe translation process was pretty straightforward and actually enjoyable. The Python code ended up being much cleaner and shorter than the JavaScript code (line count reduced to 284 from 378). This was partly thanks to the newly-found freedom of being able to install any package I wanted, and partly due to the reduction in callbacks, which made the code less nested and easier to understand.\nI was hoping to use PyJade to obviate the need for translating the page templates to Jinja. However, I ran into a bunch of issues and subtle bugs that made me decide to use PyJade for one-off translation to Jinja, followed by a manual process of ensuring that each template was converted correctly. Some of the issues were:\nUsing PyJade’s Flask extension compiles the templates to Jinja on the fly, so debugging issues is hard because the line numbers in the generated Jinja templates don’t match the line numbers in the original Jade files. Jade allows the use of arbitrary JavaScript code, which PyJade doesn’t translate to Python (makes sense – it’d be too hard and messy). This caused many of my templates to simply not work because, e.g., I used the ternary operator or called a built-in JavaScript function. Worse than that, some cases failed silently, e.g., calling arr.length where arr is an array works fine in pure Jade, but is undefined in Python because arrays don’t have a length attribute. Hyphenated block names are fine in Jade, but don’t compile in Jinja. The conversion to Jinja pretty much offset the cleanliness gained in the Python code, with a growth in template line count from 403 to 464 lines, and much clutter with unnecessary closing tags. Jade, I will miss you, but I guess I can’t have it all.\nThe good news is that latency immediately dropped as I deployed the new environment. The graph below almost says it all. What’s missing is the much more massive spikes (5-60 seconds) and timeouts that happen pretty frequently with Parse hosting.\nNote that this graph is for a simple GET request of the homepage without fetching any of the embedded static assets or running client-side rendering. Handling the request simply populates a Jade template without touching the database. It really shouldn’t take too long unless the server is under very heavy load. And even then, Parse is supposed to handle such loads gracefully – not needing to worry about this kind of stuff is the key reason for using a backend-as-a-service!\nFinal thoughts I really like the idea behind Parse, as setting up and running a web backend is not a trivial task. They do provide some good tooling, and I was happy to work around the minor issues and restrictions that come with working in a sandboxed environment. However, the lack of reliability is a huge disadvantage, even at the attractive price point of $0. Further, there’s no indication that paying for the service would increase reliability, as the free tier includes up to 30 requests / second and it can barely handle a single request. Maybe I’ll get back to Parse one day, but for now I’m much happier with the increased power and responsibility of managing my own servers.\nUpdate (30 January, 2016): Facebook has announced it will be shutting Parse down, which is a shame. It could have been a great service if they had just focused more on reliability. You just couldn’t run serious apps on Parse, which probably meant that not many apps were upgraded to the paid tiers. It’s very disappointing that Facebook didn’t help Parse realise its potential, but this isn’t the first time a big company takes over a small product and shuts it down. It’s just the way of the world.\n","wordCount":"1323","inLanguage":"en","image":"https://yanirseroussi.com/2015/07/31/goodbye-parse-com/farewell.jpg","datePublished":"2015-07-31T03:29:50Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/07/31/goodbye-parse-com/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Goodbye, Parse.com</h1><div class=post-meta><span title='2015-07-31 03:29:50 +0000 UTC'>July 31, 2015</span></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/2015/07/31/goodbye-parse-com/farewell.jpg alt></figure><div class=post-content><p>Over the past year, I&rsquo;ve been using <a href=https://parse.com target=_blank rel=noopener>Parse</a>‘s free backend-as-a-service and web hosting to serve <a href=http://www.bcrecommender.com target=_blank rel=noopener>BCRecommender (music recommendation service)</a> and Price Dingo (now-closed shopping comparison engine). The main lesson: You get what you pay for. Despite some improvements, Parse remains very unreliable, and any time saved by using their APIs and SDKs tends to be offset by having to work around the restrictions of their sandboxed environment. This post details some of the issues I faced and the transition away from the service.</p><h3 id=whats-so-bad-about-parse>What&rsquo;s so bad about Parse?<a hidden class=anchor aria-hidden=true href=#whats-so-bad-about-parse>#</a></h3><p>In one word: <strong>reliability</strong>. The service is simply unreliable, with many latency spikes and random errors. I <a href=https://developers.facebook.com/bugs/1550140598598847/ target=_blank rel=noopener>reported this issue six months ago</a>, and it&rsquo;s still being investigated. Reliability has been a known issue for years (see <a href=http://stackoverflow.com/questions/11283729/how-scalable-is-parse/24253932#24253932 target=_blank rel=noopener>Stack Overflow</a> and <a href="https://news.ycombinator.com/item?id=8347310" target=_blank rel=noopener>Hacker News</a> discussions). Parse&rsquo;s acquisition by Facebook over two years ago gave some hope that these issues would be resolved quickly, but this is just not the case.</p><p>It is worth noting that the way I used Parse was probably somewhat uncommon. For both Price Dingo and BCRecommender, data was scraped and processed outside Parse, and then imported in bulk into Parse. As bulk imports are not supported by the API, <a href=https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/>automating the process required reliance on the web interface</a>, which made things somewhat fragile. Further, a few months ago Parse inexplicably dropped support for uploading zipped files, making imports much slower. Finally, when importing large collections, I found that it takes ages for the data to get indexed. The final straw was with the last BCRecommender update, where even after days of waiting the data was still not fully indexed.</p><h3 id=price-dingos-transition>Price Dingo&rsquo;s transition<a hidden class=anchor aria-hidden=true href=#price-dingos-transition>#</a></h3><p>Price Dingo was a shopping comparison engine with a web interface. The idea was to focus on user needs in specialised product categories, as opposed to the traditional model that requires merchants to pay to be listed. I decided to shut down the service a few months ago to focus on other things, but before the shutdown, I almost completed the transition away from Parse. The first step was replacing the persistence layer with <a href=https://www.algolia.com/ target=_blank rel=noopener>Algolia – search engine as a service</a>. Algolia is super-fast, its advanced search capabilities are way better than Parse&rsquo;s search options, and as a paid service their customer support was excellent. If I hadn&rsquo;t shut Price Dingo down, the second step would have been replacing Parse hosting with a more reliable service, as I have recently done for BCRecommender.</p><h3 id=bcrecommenders-transition>BCRecommender&rsquo;s transition<a hidden class=anchor aria-hidden=true href=#bcrecommenders-transition>#</a></h3><p>The Parse-hosted part of BCRecommender was a fairly simple <a href=http://expressjs.com/ target=_blank rel=noopener>express.js</a> backend that rendered <a href=http://jade-lang.com/ target=_blank rel=noopener>Jade</a> templates. The fastest transition would probably have been to set up a standalone express.js backend and replace the Parse API calls with calls to the database. But as I much prefer coding in Python (<a href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/>the recommendation-generating backend is in Python</a>), I decided to completely rewrite the web backend using <a href=http://flask.pocoo.org/ target=_blank rel=noopener>Flask</a>.</p><p>For hosting, I decided to go with <a href="https://www.digitalocean.com/?refcode=cd96cae9d5e1" target=_blank rel=noopener>DigitalOcean</a> (signing up with this link gives you US$10 credit), because it has a good reputation, and it <a href=https://www.scriptrock.com/articles/cloud-service-provider-roundup-the-best-of-the-best target=_blank rel=noopener>compares favourably with other infrastructure-as-a-service providers</a>. For US$10/month you get a server with 1GB of memory, 30GB of SSD storage, and 2TB of data transfers, which should be more than enough for BCRecommender&rsquo;s modest traffic (200 daily users + ~2 bot requests per second).</p><p>Setting up the BCRecommender webapp stack is a bit more involved than getting started with Parse, but fortunately I was already familiar with all parts of the stack. It ended up being almost identical to the stack used in Charlie Huang&rsquo;s blog post <a href=http://www.sasanalysis.com/2015/02/deploy-mongodb-powered-flask-app-in-5.html target=_blank rel=noopener>Deploy a MongoDB powered Flask app in 5 minutes</a>: an Ubuntu server running MongoDB as the persistence layer, Nginx as the webserver, Gunicorn as the WSGI proxy, Supervisor for daemon management, and Fabric for managing deployments.</p><p>Before deploying to DigitalOcean, I used <a href=https://www.vagrantup.com/ target=_blank rel=noopener>Vagrant</a> to set up a local development environment, which is almost identical to the production environment. Deployment scripts are one thing that you don&rsquo;t have to worry about when using Parse, as they provide their own build tools. However, it&rsquo;s not too hard to implement your own scripts, so within a few hours I had the environment and the deployment scripts up and ready for translating the webapp code from express.js to Flask.</p><p>The translation process was pretty straightforward and actually enjoyable. The Python code ended up being much cleaner and shorter than the JavaScript code (line count reduced to 284 from 378). This was partly thanks to the newly-found freedom of being able to install any package I wanted, and partly due to the reduction in callbacks, which made the code less nested and easier to understand.</p><p>I was hoping to use <a href=https://github.com/SyrusAkbary/pyjade target=_blank rel=noopener>PyJade</a> to obviate the need for translating the page templates to <a href=http://jinja.pocoo.org/ target=_blank rel=noopener>Jinja</a>. However, I ran into a bunch of issues and subtle bugs that made me decide to use PyJade for one-off translation to Jinja, followed by a manual process of ensuring that each template was converted correctly. Some of the issues were:</p><ul><li>Using PyJade&rsquo;s Flask extension compiles the templates to Jinja on the fly, so debugging issues is hard because the line numbers in the generated Jinja templates don&rsquo;t match the line numbers in the original Jade files.</li><li>Jade allows the use of arbitrary JavaScript code, which PyJade doesn&rsquo;t translate to Python (makes sense – it&rsquo;d be too hard and messy). This caused many of my templates to simply not work because, e.g., I used the ternary operator or called a built-in JavaScript function. Worse than that, some cases failed silently, e.g., calling <code>arr.length</code> where <code>arr</code> is an array works fine in pure Jade, but is undefined in Python because arrays don&rsquo;t have a length attribute.</li><li>Hyphenated block names are fine in Jade, but don&rsquo;t compile in Jinja.</li></ul><p>The conversion to Jinja pretty much offset the cleanliness gained in the Python code, with a growth in template line count from 403 to 464 lines, and much clutter with unnecessary closing tags. Jade, I will miss you, but I guess I can&rsquo;t have it all.</p><p>The good news is that latency immediately dropped as I deployed the new environment. The graph below almost says it all. What&rsquo;s missing is the much more massive spikes (5-60 seconds) and timeouts that happen pretty frequently with Parse hosting.</p><figure><a href=bcrecommender-latency-digital-ocean.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="BCRecommender,DevOps,parse.com,software engineering"><meta name=description content="Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/07/31/goodbye-parse-com/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/07/31/goodbye-parse-com/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Goodbye, Parse.com"><meta property="og:description" content="Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/07/31/goodbye-parse-com/"><meta property="og:image" content="https://yanirseroussi.com/2015/07/31/goodbye-parse-com/farewell.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-07-31T03:29:50+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/07/31/goodbye-parse-com/farewell.jpg"><meta name=twitter:title content="Goodbye, Parse.com"><meta name=twitter:description content="Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Goodbye, Parse.com","item":"https://yanirseroussi.com/2015/07/31/goodbye-parse-com/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Goodbye, Parse.com","name":"Goodbye, Parse.com","description":"Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution.","keywords":["BCRecommender","DevOps","parse.com","software engineering"],"articleBody":"Over the past year, I’ve been using Parse‘s free backend-as-a-service and web hosting to serve BCRecommender (music recommendation service) and Price Dingo (now-closed shopping comparison engine). The main lesson: You get what you pay for. Despite some improvements, Parse remains very unreliable, and any time saved by using their APIs and SDKs tends to be offset by having to work around the restrictions of their sandboxed environment. This post details some of the issues I faced and the transition away from the service.\nWhat’s so bad about Parse? In one word: reliability. The service is simply unreliable, with many latency spikes and random errors. I reported this issue six months ago, and it’s still being investigated. Reliability has been a known issue for years (see Stack Overflow and Hacker News discussions). Parse’s acquisition by Facebook over two years ago gave some hope that these issues would be resolved quickly, but this is just not the case.\nIt is worth noting that the way I used Parse was probably somewhat uncommon. For both Price Dingo and BCRecommender, data was scraped and processed outside Parse, and then imported in bulk into Parse. As bulk imports are not supported by the API, automating the process required reliance on the web interface, which made things somewhat fragile. Further, a few months ago Parse inexplicably dropped support for uploading zipped files, making imports much slower. Finally, when importing large collections, I found that it takes ages for the data to get indexed. The final straw was with the last BCRecommender update, where even after days of waiting the data was still not fully indexed.\nPrice Dingo’s transition Price Dingo was a shopping comparison engine with a web interface. The idea was to focus on user needs in specialised product categories, as opposed to the traditional model that requires merchants to pay to be listed. I decided to shut down the service a few months ago to focus on other things, but before the shutdown, I almost completed the transition away from Parse. The first step was replacing the persistence layer with Algolia – search engine as a service. Algolia is super-fast, its advanced search capabilities are way better than Parse’s search options, and as a paid service their customer support was excellent. If I hadn’t shut Price Dingo down, the second step would have been replacing Parse hosting with a more reliable service, as I have recently done for BCRecommender.\nBCRecommender’s transition The Parse-hosted part of BCRecommender was a fairly simple express.js backend that rendered Jade templates. The fastest transition would probably have been to set up a standalone express.js backend and replace the Parse API calls with calls to the database. But as I much prefer coding in Python (the recommendation-generating backend is in Python), I decided to completely rewrite the web backend using Flask.\nFor hosting, I decided to go with DigitalOcean (signing up with this link gives you US$10 credit), because it has a good reputation, and it compares favourably with other infrastructure-as-a-service providers. For US$10/month you get a server with 1GB of memory, 30GB of SSD storage, and 2TB of data transfers, which should be more than enough for BCRecommender’s modest traffic (200 daily users + ~2 bot requests per second).\nSetting up the BCRecommender webapp stack is a bit more involved than getting started with Parse, but fortunately I was already familiar with all parts of the stack. It ended up being almost identical to the stack used in Charlie Huang’s blog post Deploy a MongoDB powered Flask app in 5 minutes: an Ubuntu server running MongoDB as the persistence layer, Nginx as the webserver, Gunicorn as the WSGI proxy, Supervisor for daemon management, and Fabric for managing deployments.\nBefore deploying to DigitalOcean, I used Vagrant to set up a local development environment, which is almost identical to the production environment. Deployment scripts are one thing that you don’t have to worry about when using Parse, as they provide their own build tools. However, it’s not too hard to implement your own scripts, so within a few hours I had the environment and the deployment scripts up and ready for translating the webapp code from express.js to Flask.\nThe translation process was pretty straightforward and actually enjoyable. The Python code ended up being much cleaner and shorter than the JavaScript code (line count reduced to 284 from 378). This was partly thanks to the newly-found freedom of being able to install any package I wanted, and partly due to the reduction in callbacks, which made the code less nested and easier to understand.\nI was hoping to use PyJade to obviate the need for translating the page templates to Jinja. However, I ran into a bunch of issues and subtle bugs that made me decide to use PyJade for one-off translation to Jinja, followed by a manual process of ensuring that each template was converted correctly. Some of the issues were:\nUsing PyJade’s Flask extension compiles the templates to Jinja on the fly, so debugging issues is hard because the line numbers in the generated Jinja templates don’t match the line numbers in the original Jade files. Jade allows the use of arbitrary JavaScript code, which PyJade doesn’t translate to Python (makes sense – it’d be too hard and messy). This caused many of my templates to simply not work because, e.g., I used the ternary operator or called a built-in JavaScript function. Worse than that, some cases failed silently, e.g., calling arr.length where arr is an array works fine in pure Jade, but is undefined in Python because arrays don’t have a length attribute. Hyphenated block names are fine in Jade, but don’t compile in Jinja. The conversion to Jinja pretty much offset the cleanliness gained in the Python code, with a growth in template line count from 403 to 464 lines, and much clutter with unnecessary closing tags. Jade, I will miss you, but I guess I can’t have it all.\nThe good news is that latency immediately dropped as I deployed the new environment. The graph below almost says it all. What’s missing is the much more massive spikes (5-60 seconds) and timeouts that happen pretty frequently with Parse hosting.\nNote that this graph is for a simple GET request of the homepage without fetching any of the embedded static assets or running client-side rendering. Handling the request simply populates a Jade template without touching the database. It really shouldn’t take too long unless the server is under very heavy load. And even then, Parse is supposed to handle such loads gracefully – not needing to worry about this kind of stuff is the key reason for using a backend-as-a-service!\nFinal thoughts I really like the idea behind Parse, as setting up and running a web backend is not a trivial task. They do provide some good tooling, and I was happy to work around the minor issues and restrictions that come with working in a sandboxed environment. However, the lack of reliability is a huge disadvantage, even at the attractive price point of $0. Further, there’s no indication that paying for the service would increase reliability, as the free tier includes up to 30 requests / second and it can barely handle a single request. Maybe I’ll get back to Parse one day, but for now I’m much happier with the increased power and responsibility of managing my own servers.\nUpdate (30 January, 2016): Facebook has announced it will be shutting Parse down, which is a shame. It could have been a great service if they had just focused more on reliability. You just couldn’t run serious apps on Parse, which probably meant that not many apps were upgraded to the paid tiers. It’s very disappointing that Facebook didn’t help Parse realise its potential, but this isn’t the first time a big company takes over a small product and shuts it down. It’s just the way of the world.\n","wordCount":"1323","inLanguage":"en","image":"https://yanirseroussi.com/2015/07/31/goodbye-parse-com/farewell.jpg","datePublished":"2015-07-31T03:29:50Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/07/31/goodbye-parse-com/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Goodbye, Parse.com</h1><div class=post-meta><span title='2015-07-31 03:29:50 +0000 UTC'>July 31, 2015</span></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/2015/07/31/goodbye-parse-com/farewell.jpg alt></figure><div class=post-content><p>Over the past year, I&rsquo;ve been using <a href=https://parse.com target=_blank rel=noopener>Parse</a>‘s free backend-as-a-service and web hosting to serve <a href=http://www.bcrecommender.com target=_blank rel=noopener>BCRecommender (music recommendation service)</a> and Price Dingo (now-closed shopping comparison engine). The main lesson: You get what you pay for. Despite some improvements, Parse remains very unreliable, and any time saved by using their APIs and SDKs tends to be offset by having to work around the restrictions of their sandboxed environment. This post details some of the issues I faced and the transition away from the service.</p><h3 id=whats-so-bad-about-parse>What&rsquo;s so bad about Parse?<a hidden class=anchor aria-hidden=true href=#whats-so-bad-about-parse>#</a></h3><p>In one word: <strong>reliability</strong>. The service is simply unreliable, with many latency spikes and random errors. I <a href=https://developers.facebook.com/bugs/1550140598598847/ target=_blank rel=noopener>reported this issue six months ago</a>, and it&rsquo;s still being investigated. Reliability has been a known issue for years (see <a href=http://stackoverflow.com/questions/11283729/how-scalable-is-parse/24253932#24253932 target=_blank rel=noopener>Stack Overflow</a> and <a href="https://news.ycombinator.com/item?id=8347310" target=_blank rel=noopener>Hacker News</a> discussions). Parse&rsquo;s acquisition by Facebook over two years ago gave some hope that these issues would be resolved quickly, but this is just not the case.</p><p>It is worth noting that the way I used Parse was probably somewhat uncommon. For both Price Dingo and BCRecommender, data was scraped and processed outside Parse, and then imported in bulk into Parse. As bulk imports are not supported by the API, <a href=https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/>automating the process required reliance on the web interface</a>, which made things somewhat fragile. Further, a few months ago Parse inexplicably dropped support for uploading zipped files, making imports much slower. Finally, when importing large collections, I found that it takes ages for the data to get indexed. The final straw was with the last BCRecommender update, where even after days of waiting the data was still not fully indexed.</p><h3 id=price-dingos-transition>Price Dingo&rsquo;s transition<a hidden class=anchor aria-hidden=true href=#price-dingos-transition>#</a></h3><p>Price Dingo was a shopping comparison engine with a web interface. The idea was to focus on user needs in specialised product categories, as opposed to the traditional model that requires merchants to pay to be listed. I decided to shut down the service a few months ago to focus on other things, but before the shutdown, I almost completed the transition away from Parse. The first step was replacing the persistence layer with <a href=https://www.algolia.com/ target=_blank rel=noopener>Algolia – search engine as a service</a>. Algolia is super-fast, its advanced search capabilities are way better than Parse&rsquo;s search options, and as a paid service their customer support was excellent. If I hadn&rsquo;t shut Price Dingo down, the second step would have been replacing Parse hosting with a more reliable service, as I have recently done for BCRecommender.</p><h3 id=bcrecommenders-transition>BCRecommender&rsquo;s transition<a hidden class=anchor aria-hidden=true href=#bcrecommenders-transition>#</a></h3><p>The Parse-hosted part of BCRecommender was a fairly simple <a href=http://expressjs.com/ target=_blank rel=noopener>express.js</a> backend that rendered <a href=http://jade-lang.com/ target=_blank rel=noopener>Jade</a> templates. The fastest transition would probably have been to set up a standalone express.js backend and replace the Parse API calls with calls to the database. But as I much prefer coding in Python (<a href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/>the recommendation-generating backend is in Python</a>), I decided to completely rewrite the web backend using <a href=http://flask.pocoo.org/ target=_blank rel=noopener>Flask</a>.</p><p>For hosting, I decided to go with <a href="https://www.digitalocean.com/?refcode=cd96cae9d5e1" target=_blank rel=noopener>DigitalOcean</a> (signing up with this link gives you US$10 credit), because it has a good reputation, and it <a href=https://www.scriptrock.com/articles/cloud-service-provider-roundup-the-best-of-the-best target=_blank rel=noopener>compares favourably with other infrastructure-as-a-service providers</a>. For US$10/month you get a server with 1GB of memory, 30GB of SSD storage, and 2TB of data transfers, which should be more than enough for BCRecommender&rsquo;s modest traffic (200 daily users + ~2 bot requests per second).</p><p>Setting up the BCRecommender webapp stack is a bit more involved than getting started with Parse, but fortunately I was already familiar with all parts of the stack. It ended up being almost identical to the stack used in Charlie Huang&rsquo;s blog post <a href=http://www.sasanalysis.com/2015/02/deploy-mongodb-powered-flask-app-in-5.html target=_blank rel=noopener>Deploy a MongoDB powered Flask app in 5 minutes</a>: an Ubuntu server running MongoDB as the persistence layer, Nginx as the webserver, Gunicorn as the WSGI proxy, Supervisor for daemon management, and Fabric for managing deployments.</p><p>Before deploying to DigitalOcean, I used <a href=https://www.vagrantup.com/ target=_blank rel=noopener>Vagrant</a> to set up a local development environment, which is almost identical to the production environment. Deployment scripts are one thing that you don&rsquo;t have to worry about when using Parse, as they provide their own build tools. However, it&rsquo;s not too hard to implement your own scripts, so within a few hours I had the environment and the deployment scripts up and ready for translating the webapp code from express.js to Flask.</p><p>The translation process was pretty straightforward and actually enjoyable. The Python code ended up being much cleaner and shorter than the JavaScript code (line count reduced to 284 from 378). This was partly thanks to the newly-found freedom of being able to install any package I wanted, and partly due to the reduction in callbacks, which made the code less nested and easier to understand.</p><p>I was hoping to use <a href=https://github.com/SyrusAkbary/pyjade target=_blank rel=noopener>PyJade</a> to obviate the need for translating the page templates to <a href=http://jinja.pocoo.org/ target=_blank rel=noopener>Jinja</a>. However, I ran into a bunch of issues and subtle bugs that made me decide to use PyJade for one-off translation to Jinja, followed by a manual process of ensuring that each template was converted correctly. Some of the issues were:</p><ul><li>Using PyJade&rsquo;s Flask extension compiles the templates to Jinja on the fly, so debugging issues is hard because the line numbers in the generated Jinja templates don&rsquo;t match the line numbers in the original Jade files.</li><li>Jade allows the use of arbitrary JavaScript code, which PyJade doesn&rsquo;t translate to Python (makes sense – it&rsquo;d be too hard and messy). This caused many of my templates to simply not work because, e.g., I used the ternary operator or called a built-in JavaScript function. Worse than that, some cases failed silently, e.g., calling <code>arr.length</code> where <code>arr</code> is an array works fine in pure Jade, but is undefined in Python because arrays don&rsquo;t have a length attribute.</li><li>Hyphenated block names are fine in Jade, but don&rsquo;t compile in Jinja.</li></ul><p>The conversion to Jinja pretty much offset the cleanliness gained in the Python code, with a growth in template line count from 403 to 464 lines, and much clutter with unnecessary closing tags. Jade, I will miss you, but I guess I can&rsquo;t have it all.</p><p>The good news is that latency immediately dropped as I deployed the new environment. The graph below almost says it all. What&rsquo;s missing is the much more massive spikes (5-60 seconds) and timeouts that happen pretty frequently with Parse hosting.</p><figure><a href=bcrecommender-latency-digital-ocean.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2015/07/31/goodbye-parse-com/bcrecommender-latency-digital-ocean_huab82bdddef099c15f4714a7d4c558d3e_32379_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2015/07/31/goodbye-parse-com/bcrecommender-latency-digital-ocean_huab82bdddef099c15f4714a7d4c558d3e_32379_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2015/07/31/goodbye-parse-com/bcrecommender-latency-digital-ocean_huab82bdddef099c15f4714a7d4c558d3e_32379_720x0_resize_box_3.png 720w,
diff --git a/2015/08/24/you-dont-need-a-data-scientist-yet/index.html b/2015/08/24/you-dont-need-a-data-scientist-yet/index.html
index 7f37da185..1e5f3834d 100644
--- a/2015/08/24/you-dont-need-a-data-scientist-yet/index.html
+++ b/2015/08/24/you-dont-need-a-data-scientist-yet/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>You don’t need a data scientist (yet) | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,data business,data science"><meta name=description content="Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="You don’t need a data scientist (yet)"><meta property="og:description" content="Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/"><meta property="og:image" content="https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/hammer.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-08-24T08:25:30+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/hammer.jpg"><meta name=twitter:title content="You don’t need a data scientist (yet)"><meta name=twitter:description content="Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"You don’t need a data scientist (yet)","item":"https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"You don’t need a data scientist (yet)","name":"You don’t need a data scientist (yet)","description":"Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist.","keywords":["business","data business","data science"],"articleBody":"The hype around big data has caused many organisations to hire data scientists without giving much thought to what these data scientists are going to do and whether they’re actually needed. This is a source of frustration for all parties involved. This post discusses some questions you should ask yourself before deciding to hire your first data scientist.\nQ1: Do you know what data scientists do? Somewhat surprisingly, there are quite a few companies that hire data scientists without having a clear idea of what data scientists actually do. People seem to have a fear of missing out on the big data hype, and think of hiring data scientists as the solution. A common misconception is that a data scientist’s role includes telling you what to do with your data. While this may sometimes happen in practice, the ideal scenario is where the business has problems that can be solved using data science (more on this under Q3 below). If you don’t know what your data scientist is going to do, you probably don’t need one.\nSo what do data scientists do? When you think about it, adding the word “data” to “science” is a bit redundant, as all science is based on data. Following from this, anyone who does any kind of data analysis is a data scientist. While it may be true, this broad definition is not very helpful. As discussed in a previous post, it’s more useful to define data scientists as individuals who combine expertise in statistics and machine learning with strong software engineering skills.\nQ2: Do you have enough data available? It’s not uncommon to see products that suffer from over-engineering and premature investment in advanced analytics capabilities. In the early stages, it’s important to focus on creating a minimum viable product and getting it to market quickly. Data science starts to shine once the product is generating enough data, as most of the power of advanced analytics is in optimising and automating existing processes.\nNot having a data scientist in the early stages doesn’t mean the data is being ignored – it just means that it doesn’t require the attention of a full-time data scientist. If your product is at an early stage and you are still concerned, you’re better off hiring a data science consultant for a few days to help lay out the long-term vision for data-driven capabilities. This would be cheaper and less time-consuming than hiring a full-timer. The exception to this rule is when the product itself is built around advanced analytics (e.g., AlchemyAPI or Enlitic). Building such products without data scientists is far from ideal, or just impossible.\nEven if your product is mature and generating a lot of data, it doesn’t mean it’s ready for data science. Advanced analytics capabilities are at the top of data’s hierarchy of needs: If your product is buggy, or if your data is scattered everywhere and your platform lacks centralised reporting, you need to first invest in fixing your data plumbing. This is the job of data engineers. Getting data scientists involved when the data is hardly available due to infrastructure issues is likely to lead to frustration. In addition, setting up centralised reporting and dashboarding is likely to give you ideas for problems that data scientists can solve.\nQ3: Do you have a specific problem to solve? If the problem you’re trying to solve is “everyone is doing smart things with data, we should be doing stuff with data too”, you don’t have a specific problem that can be solved by bringing a data scientist on board. Defining the problem often ends up occupying a lot of the data scientist’s time, so you are likely to obtain better results if have more than just a vague idea around “doing something with data, because Hadoop”. Ideally you want to optimise an existing process that is currently being solved with heuristics, make an existing model better, implement a new data-driven feature, or something along these lines. Common examples include reducing churn, increasing conversions, and replacing manual processes with automated data-driven systems. Again, getting advice from experienced data scientists before committing to hiring one may be your best first step.\nQ4: Can you get away with heuristics, intuition, and/or manual processes? Some data scientists would passionately claim that you must deploy only models that are theoretically justified and well-tested. However, in many cases you can get away with using simple heuristics, intuition, and/or manual processes. These can be orders of magnitude cheaper than building sophisticated predictive models and the infrastructure to support them. For many businesses, there are more pressing needs than doing everything in a theoretically sound way. Despite what many technical people like to think, customers don’t tend to care how things are implemented, as long as their needs are fulfilled.\nFor example, I spent some time with a client whose product includes a semi-manual part where structured data is extracted from documents. Their process included sending some of the documents to a trained team in the Philippines for manual analysis. The client was interested in replacing that manual work with a machine learning algorithm. As is often the case with machine learning, it was unknown whether the resultant model would be accurate enough to completely replace the manual workers. This generally depends on data quality and the feasibility of solving the problem. Assessing the feasibility would have taken some time and money, so the client decided to park the idea and focus on other areas of their business.\nEvery business has resource constraints. Situations where the best investment you can make is hiring a full-time data scientist are rarer than what the hype may make you think. It’s often the case that functions that would be the responsibility of a data scientist are adequately performed by existing employees, such as software engineers, business/data analysts, and marketers.\nQ5: Are you committed to being data-driven? I have seen more than one case where data scientists are hired only to be blocked or ignored. This is more prevalent in the corporate world, where managers are often incentivised to prioritise doing things that look good over things that make financial sense. But even if recruitment is done with the best intentions, progress may be blocked by employees who feel threatened because they would be replaced by automated data-driven algorithms. Successful data science projects require support from senior leadership, as discussed by Greta Roberts, Radim Řehůřek, Alec Smith, and many others. Without such support and a strong commitment to making data-driven decisions, everyone is just wasting their time.\nClosing thoughts While data science is currently over-hyped, many organisations still have much to gain from hiring data scientists. I hope that this post has helped you decide whether you need a data scientist right now. If you’re unsure, please don’t hesitate to contact me. And to any data scientists reading this: Be very wary of potential employers who do not have good answers to the above questions. At this point in time you can afford to be picky, at least until the hype is over.\n","wordCount":"1178","inLanguage":"en","image":"https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/hammer.jpg","datePublished":"2015-08-24T08:25:30Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">You don’t need a data scientist (yet)</h1><div class=post-meta><span title='2015-08-24 08:25:30 +0000 UTC'>August 24, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/hammer_hu2c8b5baf56bd11c08a3f40db9407264b_42562_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/hammer_hu2c8b5baf56bd11c08a3f40db9407264b_42562_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/hammer.jpg 560w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/hammer.jpg alt width=560 height=300></figure><div class=post-content><p>The hype around big data has caused many organisations to hire data scientists without giving much thought to what these data scientists are going to do and whether they&rsquo;re actually needed. This is a source of frustration for all parties involved. This post discusses some questions you should ask yourself before deciding to hire your first data scientist.</p><h3 id=q1-do-you-know-what-data-scientists-do>Q1: Do you know what data scientists do?<a hidden class=anchor aria-hidden=true href=#q1-do-you-know-what-data-scientists-do>#</a></h3><p>Somewhat surprisingly, there are quite a few companies that hire data scientists without having a clear idea of what data scientists actually do. People seem to have a fear of missing out on the big data hype, and think of hiring data scientists as the solution. A common misconception is that a data scientist&rsquo;s role includes telling you what to do with your data. While this may sometimes happen in practice, the ideal scenario is where the business has problems that can be solved using data science (more on this under Q3 below). If you don&rsquo;t know what your data scientist is going to do, you probably don&rsquo;t need one.</p><p>So what do data scientists do? When you think about it, adding the word &ldquo;data&rdquo; to &ldquo;science&rdquo; is a bit redundant, as all science is based on data. Following from this, <a href=http://robjhyndman.com/hyndsight/am-i-a-data-scientist/ target=_blank rel=noopener>anyone who does any kind of data analysis is a data scientist</a>. While it may be true, this broad definition is not very helpful. <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>As discussed in a previous post</a>, it&rsquo;s more useful to define data scientists as individuals who combine expertise in statistics and machine learning with strong software engineering skills.</p><h3 id=q2-do-you-have-enough-data-available>Q2: Do you have enough data available?<a hidden class=anchor aria-hidden=true href=#q2-do-you-have-enough-data-available>#</a></h3><p>It&rsquo;s not uncommon to see products that suffer from over-engineering and premature investment in advanced analytics capabilities. In the early stages, it&rsquo;s important to focus on creating a minimum viable product and getting it to market quickly. Data science starts to shine once the product is generating enough data, as most of the power of advanced analytics is in optimising and automating existing processes.</p><p>Not having a data scientist in the early stages doesn&rsquo;t mean the data is being ignored – it just means that it doesn&rsquo;t require the attention of a full-time data scientist. If your product is at an early stage and you are still concerned, you&rsquo;re better off hiring a data science consultant for a few days to help lay out the long-term vision for data-driven capabilities. This would be cheaper and less time-consuming than hiring a full-timer. The exception to this rule is when the product itself is built around advanced analytics (e.g., <a href=http://www.alchemyapi.com/ target=_blank rel=noopener>AlchemyAPI</a> or <a href=http://www.enlitic.com/ target=_blank rel=noopener>Enlitic</a>). Building such products without data scientists is far from ideal, or just impossible.</p><p>Even if your product is mature and generating a lot of data, it doesn&rsquo;t mean it&rsquo;s ready for data science. Advanced analytics capabilities are at the top of <a href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/>data&rsquo;s hierarchy of needs</a>: If your product is buggy, or if your data is scattered everywhere and your platform lacks centralised reporting, you need to first invest in fixing your data plumbing. This is the job of <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>data engineers</a>. Getting data scientists involved when the data is hardly available due to infrastructure issues is likely to lead to frustration. In addition, setting up centralised reporting and dashboarding is likely to give you ideas for problems that data scientists can solve.</p><h3 id=q3-do-you-have-a-specific-problem-to-solve>Q3: Do you have a specific problem to solve?<a hidden class=anchor aria-hidden=true href=#q3-do-you-have-a-specific-problem-to-solve>#</a></h3><p>If the problem you&rsquo;re trying to solve is &ldquo;everyone is doing smart things with data, we should be doing stuff with data too&rdquo;, you don&rsquo;t have a specific problem that can be solved by bringing a data scientist on board. Defining the problem often ends up occupying a lot of the data scientist&rsquo;s time, so you are likely to obtain better results if have more than just a vague idea around &ldquo;doing something with data, because Hadoop&rdquo;. Ideally you want to optimise an existing process that is currently being solved with heuristics, make an existing model better, implement a new data-driven feature, or something along these lines. Common examples include reducing churn, increasing conversions, and replacing manual processes with automated data-driven systems. Again, getting advice from experienced data scientists before committing to hiring one may be your best first step.</p><h3 id=q4-can-you-get-away-with-heuristics-intuition-andor-manual-processes>Q4: Can you get away with heuristics, intuition, and/or manual processes?<a hidden class=anchor aria-hidden=true href=#q4-can-you-get-away-with-heuristics-intuition-andor-manual-processes>#</a></h3><p>Some data scientists would passionately claim that you must deploy only models that are theoretically justified and well-tested. However, in many cases you can get away with using simple heuristics, intuition, and/or manual processes. These can be orders of magnitude cheaper than building sophisticated predictive models and the infrastructure to support them. For many businesses, there are more pressing needs than doing everything in a theoretically sound way. Despite what many technical people like to think, customers don&rsquo;t tend to care how things are implemented, as long as their needs are fulfilled.</p><p>For example, I spent some time with a client whose product includes a semi-manual part where structured data is extracted from documents. Their process included sending some of the documents to a trained team in the Philippines for manual analysis. The client was interested in replacing that manual work with a machine learning algorithm. As is often the case with machine learning, it was unknown whether the resultant model would be accurate enough to completely replace the manual workers. This generally depends on data quality and the feasibility of solving the problem. Assessing the feasibility would have taken some time and money, so the client decided to park the idea and focus on other areas of their business.</p><p>Every business has resource constraints. Situations where the best investment you can make is hiring a full-time data scientist are rarer than what the hype may make you think. It&rsquo;s often the case that functions that would be the responsibility of a data scientist are adequately performed by existing employees, such as software engineers, business/data analysts, and marketers.</p><h3 id=q5-are-you-committed-to-being-data-driven>Q5: Are you committed to being data-driven?<a hidden class=anchor aria-hidden=true href=#q5-are-you-committed-to-being-data-driven>#</a></h3><p>I have seen more than one case where data scientists are hired only to be blocked or ignored. This is more prevalent in the corporate world, where managers are often incentivised to prioritise doing things that look good over things that make financial sense. But even if recruitment is done with the best intentions, progress may be blocked by employees who feel threatened because they would be replaced by automated data-driven algorithms. Successful data science projects require support from senior leadership, as discussed by <a href=http://venturebeat.com/2015/07/22/stop-hiring-data-scientists-until-youre-ready-for-data-science/ target=_blank rel=noopener>Greta Roberts</a>, <a href=https://berlinbuzzwords.de/sites/berlinbuzzwords.de/files/media/documents/radim_rehurek-so_you_want_to_be_a_data_science_consultant.pdf target=_blank rel=noopener>Radim Řehůřek</a>, <a href=https://www.linkedin.com/pulse/big-data-science-analytics-australia-alec-smith target=_blank rel=noopener>Alec Smith</a>, and many others. Without such support and a strong commitment to making data-driven decisions, everyone is just wasting their time.</p><h3 id=closing-thoughts>Closing thoughts<a hidden class=anchor aria-hidden=true href=#closing-thoughts>#</a></h3><p>While data science is currently over-hyped, many organisations still have much to gain from hiring data scientists. I hope that this post has helped you decide whether you need a data scientist right now. If you&rsquo;re unsure, please don&rsquo;t hesitate to <a href=https://yanirseroussi.com/about/ target=_blank rel=noopener>contact me</a>. And to any data scientists reading this: Be very wary of potential employers who do not have good answers to the above questions. At this point in time you can afford to be picky, at least until the hype is over.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/data-business/>Data Business</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share You don’t need a data scientist (yet) on x" href="https://x.com/intent/tweet/?text=You%20don%e2%80%99t%20need%20a%20data%20scientist%20%28yet%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f08%2f24%2fyou-dont-need-a-data-scientist-yet%2f&amp;hashtags=business%2cdatabusiness%2cdatascience"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don’t need a data scientist (yet) on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f08%2f24%2fyou-dont-need-a-data-scientist-yet%2f&amp;title=You%20don%e2%80%99t%20need%20a%20data%20scientist%20%28yet%29&amp;summary=You%20don%e2%80%99t%20need%20a%20data%20scientist%20%28yet%29&amp;source=https%3a%2f%2fyanirseroussi.com%2f2015%2f08%2f24%2fyou-dont-need-a-data-scientist-yet%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don’t need a data scientist (yet) on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2015%2f08%2f24%2fyou-dont-need-a-data-scientist-yet%2f&title=You%20don%e2%80%99t%20need%20a%20data%20scientist%20%28yet%29"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don’t need a data scientist (yet) on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2015%2f08%2f24%2fyou-dont-need-a-data-scientist-yet%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don’t need a data scientist (yet) on whatsapp" href="https://api.whatsapp.com/send?text=You%20don%e2%80%99t%20need%20a%20data%20scientist%20%28yet%29%20-%20https%3a%2f%2fyanirseroussi.com%2f2015%2f08%2f24%2fyou-dont-need-a-data-scientist-yet%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don’t need a data scientist (yet) on telegram" href="https://telegram.me/share/url?text=You%20don%e2%80%99t%20need%20a%20data%20scientist%20%28yet%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f08%2f24%2fyou-dont-need-a-data-scientist-yet%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don’t need a data scientist (yet) on ycombinator" href="https://news.ycombinator.com/submitlink?t=You%20don%e2%80%99t%20need%20a%20data%20scientist%20%28yet%29&u=https%3a%2f%2fyanirseroussi.com%2f2015%2f08%2f24%2fyou-dont-need-a-data-scientist-yet%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="business,data business,data science"><meta name=description content="Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="You don’t need a data scientist (yet)"><meta property="og:description" content="Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/"><meta property="og:image" content="https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/hammer.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-08-24T08:25:30+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/hammer.jpg"><meta name=twitter:title content="You don’t need a data scientist (yet)"><meta name=twitter:description content="Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"You don’t need a data scientist (yet)","item":"https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"You don’t need a data scientist (yet)","name":"You don’t need a data scientist (yet)","description":"Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist.","keywords":["business","data business","data science"],"articleBody":"The hype around big data has caused many organisations to hire data scientists without giving much thought to what these data scientists are going to do and whether they’re actually needed. This is a source of frustration for all parties involved. This post discusses some questions you should ask yourself before deciding to hire your first data scientist.\nQ1: Do you know what data scientists do? Somewhat surprisingly, there are quite a few companies that hire data scientists without having a clear idea of what data scientists actually do. People seem to have a fear of missing out on the big data hype, and think of hiring data scientists as the solution. A common misconception is that a data scientist’s role includes telling you what to do with your data. While this may sometimes happen in practice, the ideal scenario is where the business has problems that can be solved using data science (more on this under Q3 below). If you don’t know what your data scientist is going to do, you probably don’t need one.\nSo what do data scientists do? When you think about it, adding the word “data” to “science” is a bit redundant, as all science is based on data. Following from this, anyone who does any kind of data analysis is a data scientist. While it may be true, this broad definition is not very helpful. As discussed in a previous post, it’s more useful to define data scientists as individuals who combine expertise in statistics and machine learning with strong software engineering skills.\nQ2: Do you have enough data available? It’s not uncommon to see products that suffer from over-engineering and premature investment in advanced analytics capabilities. In the early stages, it’s important to focus on creating a minimum viable product and getting it to market quickly. Data science starts to shine once the product is generating enough data, as most of the power of advanced analytics is in optimising and automating existing processes.\nNot having a data scientist in the early stages doesn’t mean the data is being ignored – it just means that it doesn’t require the attention of a full-time data scientist. If your product is at an early stage and you are still concerned, you’re better off hiring a data science consultant for a few days to help lay out the long-term vision for data-driven capabilities. This would be cheaper and less time-consuming than hiring a full-timer. The exception to this rule is when the product itself is built around advanced analytics (e.g., AlchemyAPI or Enlitic). Building such products without data scientists is far from ideal, or just impossible.\nEven if your product is mature and generating a lot of data, it doesn’t mean it’s ready for data science. Advanced analytics capabilities are at the top of data’s hierarchy of needs: If your product is buggy, or if your data is scattered everywhere and your platform lacks centralised reporting, you need to first invest in fixing your data plumbing. This is the job of data engineers. Getting data scientists involved when the data is hardly available due to infrastructure issues is likely to lead to frustration. In addition, setting up centralised reporting and dashboarding is likely to give you ideas for problems that data scientists can solve.\nQ3: Do you have a specific problem to solve? If the problem you’re trying to solve is “everyone is doing smart things with data, we should be doing stuff with data too”, you don’t have a specific problem that can be solved by bringing a data scientist on board. Defining the problem often ends up occupying a lot of the data scientist’s time, so you are likely to obtain better results if have more than just a vague idea around “doing something with data, because Hadoop”. Ideally you want to optimise an existing process that is currently being solved with heuristics, make an existing model better, implement a new data-driven feature, or something along these lines. Common examples include reducing churn, increasing conversions, and replacing manual processes with automated data-driven systems. Again, getting advice from experienced data scientists before committing to hiring one may be your best first step.\nQ4: Can you get away with heuristics, intuition, and/or manual processes? Some data scientists would passionately claim that you must deploy only models that are theoretically justified and well-tested. However, in many cases you can get away with using simple heuristics, intuition, and/or manual processes. These can be orders of magnitude cheaper than building sophisticated predictive models and the infrastructure to support them. For many businesses, there are more pressing needs than doing everything in a theoretically sound way. Despite what many technical people like to think, customers don’t tend to care how things are implemented, as long as their needs are fulfilled.\nFor example, I spent some time with a client whose product includes a semi-manual part where structured data is extracted from documents. Their process included sending some of the documents to a trained team in the Philippines for manual analysis. The client was interested in replacing that manual work with a machine learning algorithm. As is often the case with machine learning, it was unknown whether the resultant model would be accurate enough to completely replace the manual workers. This generally depends on data quality and the feasibility of solving the problem. Assessing the feasibility would have taken some time and money, so the client decided to park the idea and focus on other areas of their business.\nEvery business has resource constraints. Situations where the best investment you can make is hiring a full-time data scientist are rarer than what the hype may make you think. It’s often the case that functions that would be the responsibility of a data scientist are adequately performed by existing employees, such as software engineers, business/data analysts, and marketers.\nQ5: Are you committed to being data-driven? I have seen more than one case where data scientists are hired only to be blocked or ignored. This is more prevalent in the corporate world, where managers are often incentivised to prioritise doing things that look good over things that make financial sense. But even if recruitment is done with the best intentions, progress may be blocked by employees who feel threatened because they would be replaced by automated data-driven algorithms. Successful data science projects require support from senior leadership, as discussed by Greta Roberts, Radim Řehůřek, Alec Smith, and many others. Without such support and a strong commitment to making data-driven decisions, everyone is just wasting their time.\nClosing thoughts While data science is currently over-hyped, many organisations still have much to gain from hiring data scientists. I hope that this post has helped you decide whether you need a data scientist right now. If you’re unsure, please don’t hesitate to contact me. And to any data scientists reading this: Be very wary of potential employers who do not have good answers to the above questions. At this point in time you can afford to be picky, at least until the hype is over.\n","wordCount":"1178","inLanguage":"en","image":"https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/hammer.jpg","datePublished":"2015-08-24T08:25:30Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">You don’t need a data scientist (yet)</h1><div class=post-meta><span title='2015-08-24 08:25:30 +0000 UTC'>August 24, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/hammer_hu2c8b5baf56bd11c08a3f40db9407264b_42562_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/hammer_hu2c8b5baf56bd11c08a3f40db9407264b_42562_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/hammer.jpg 560w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/hammer.jpg alt width=560 height=300></figure><div class=post-content><p>The hype around big data has caused many organisations to hire data scientists without giving much thought to what these data scientists are going to do and whether they&rsquo;re actually needed. This is a source of frustration for all parties involved. This post discusses some questions you should ask yourself before deciding to hire your first data scientist.</p><h3 id=q1-do-you-know-what-data-scientists-do>Q1: Do you know what data scientists do?<a hidden class=anchor aria-hidden=true href=#q1-do-you-know-what-data-scientists-do>#</a></h3><p>Somewhat surprisingly, there are quite a few companies that hire data scientists without having a clear idea of what data scientists actually do. People seem to have a fear of missing out on the big data hype, and think of hiring data scientists as the solution. A common misconception is that a data scientist&rsquo;s role includes telling you what to do with your data. While this may sometimes happen in practice, the ideal scenario is where the business has problems that can be solved using data science (more on this under Q3 below). If you don&rsquo;t know what your data scientist is going to do, you probably don&rsquo;t need one.</p><p>So what do data scientists do? When you think about it, adding the word &ldquo;data&rdquo; to &ldquo;science&rdquo; is a bit redundant, as all science is based on data. Following from this, <a href=http://robjhyndman.com/hyndsight/am-i-a-data-scientist/ target=_blank rel=noopener>anyone who does any kind of data analysis is a data scientist</a>. While it may be true, this broad definition is not very helpful. <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>As discussed in a previous post</a>, it&rsquo;s more useful to define data scientists as individuals who combine expertise in statistics and machine learning with strong software engineering skills.</p><h3 id=q2-do-you-have-enough-data-available>Q2: Do you have enough data available?<a hidden class=anchor aria-hidden=true href=#q2-do-you-have-enough-data-available>#</a></h3><p>It&rsquo;s not uncommon to see products that suffer from over-engineering and premature investment in advanced analytics capabilities. In the early stages, it&rsquo;s important to focus on creating a minimum viable product and getting it to market quickly. Data science starts to shine once the product is generating enough data, as most of the power of advanced analytics is in optimising and automating existing processes.</p><p>Not having a data scientist in the early stages doesn&rsquo;t mean the data is being ignored – it just means that it doesn&rsquo;t require the attention of a full-time data scientist. If your product is at an early stage and you are still concerned, you&rsquo;re better off hiring a data science consultant for a few days to help lay out the long-term vision for data-driven capabilities. This would be cheaper and less time-consuming than hiring a full-timer. The exception to this rule is when the product itself is built around advanced analytics (e.g., <a href=http://www.alchemyapi.com/ target=_blank rel=noopener>AlchemyAPI</a> or <a href=http://www.enlitic.com/ target=_blank rel=noopener>Enlitic</a>). Building such products without data scientists is far from ideal, or just impossible.</p><p>Even if your product is mature and generating a lot of data, it doesn&rsquo;t mean it&rsquo;s ready for data science. Advanced analytics capabilities are at the top of <a href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/>data&rsquo;s hierarchy of needs</a>: If your product is buggy, or if your data is scattered everywhere and your platform lacks centralised reporting, you need to first invest in fixing your data plumbing. This is the job of <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>data engineers</a>. Getting data scientists involved when the data is hardly available due to infrastructure issues is likely to lead to frustration. In addition, setting up centralised reporting and dashboarding is likely to give you ideas for problems that data scientists can solve.</p><h3 id=q3-do-you-have-a-specific-problem-to-solve>Q3: Do you have a specific problem to solve?<a hidden class=anchor aria-hidden=true href=#q3-do-you-have-a-specific-problem-to-solve>#</a></h3><p>If the problem you&rsquo;re trying to solve is &ldquo;everyone is doing smart things with data, we should be doing stuff with data too&rdquo;, you don&rsquo;t have a specific problem that can be solved by bringing a data scientist on board. Defining the problem often ends up occupying a lot of the data scientist&rsquo;s time, so you are likely to obtain better results if have more than just a vague idea around &ldquo;doing something with data, because Hadoop&rdquo;. Ideally you want to optimise an existing process that is currently being solved with heuristics, make an existing model better, implement a new data-driven feature, or something along these lines. Common examples include reducing churn, increasing conversions, and replacing manual processes with automated data-driven systems. Again, getting advice from experienced data scientists before committing to hiring one may be your best first step.</p><h3 id=q4-can-you-get-away-with-heuristics-intuition-andor-manual-processes>Q4: Can you get away with heuristics, intuition, and/or manual processes?<a hidden class=anchor aria-hidden=true href=#q4-can-you-get-away-with-heuristics-intuition-andor-manual-processes>#</a></h3><p>Some data scientists would passionately claim that you must deploy only models that are theoretically justified and well-tested. However, in many cases you can get away with using simple heuristics, intuition, and/or manual processes. These can be orders of magnitude cheaper than building sophisticated predictive models and the infrastructure to support them. For many businesses, there are more pressing needs than doing everything in a theoretically sound way. Despite what many technical people like to think, customers don&rsquo;t tend to care how things are implemented, as long as their needs are fulfilled.</p><p>For example, I spent some time with a client whose product includes a semi-manual part where structured data is extracted from documents. Their process included sending some of the documents to a trained team in the Philippines for manual analysis. The client was interested in replacing that manual work with a machine learning algorithm. As is often the case with machine learning, it was unknown whether the resultant model would be accurate enough to completely replace the manual workers. This generally depends on data quality and the feasibility of solving the problem. Assessing the feasibility would have taken some time and money, so the client decided to park the idea and focus on other areas of their business.</p><p>Every business has resource constraints. Situations where the best investment you can make is hiring a full-time data scientist are rarer than what the hype may make you think. It&rsquo;s often the case that functions that would be the responsibility of a data scientist are adequately performed by existing employees, such as software engineers, business/data analysts, and marketers.</p><h3 id=q5-are-you-committed-to-being-data-driven>Q5: Are you committed to being data-driven?<a hidden class=anchor aria-hidden=true href=#q5-are-you-committed-to-being-data-driven>#</a></h3><p>I have seen more than one case where data scientists are hired only to be blocked or ignored. This is more prevalent in the corporate world, where managers are often incentivised to prioritise doing things that look good over things that make financial sense. But even if recruitment is done with the best intentions, progress may be blocked by employees who feel threatened because they would be replaced by automated data-driven algorithms. Successful data science projects require support from senior leadership, as discussed by <a href=http://venturebeat.com/2015/07/22/stop-hiring-data-scientists-until-youre-ready-for-data-science/ target=_blank rel=noopener>Greta Roberts</a>, <a href=https://berlinbuzzwords.de/sites/berlinbuzzwords.de/files/media/documents/radim_rehurek-so_you_want_to_be_a_data_science_consultant.pdf target=_blank rel=noopener>Radim Řehůřek</a>, <a href=https://www.linkedin.com/pulse/big-data-science-analytics-australia-alec-smith target=_blank rel=noopener>Alec Smith</a>, and many others. Without such support and a strong commitment to making data-driven decisions, everyone is just wasting their time.</p><h3 id=closing-thoughts>Closing thoughts<a hidden class=anchor aria-hidden=true href=#closing-thoughts>#</a></h3><p>While data science is currently over-hyped, many organisations still have much to gain from hiring data scientists. I hope that this post has helped you decide whether you need a data scientist right now. If you&rsquo;re unsure, please don&rsquo;t hesitate to <a href=https://yanirseroussi.com/about/ target=_blank rel=noopener>contact me</a>. And to any data scientists reading this: Be very wary of potential employers who do not have good answers to the above questions. At this point in time you can afford to be picky, at least until the hype is over.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/data-business/>Data Business</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share You don’t need a data scientist (yet) on x" href="https://x.com/intent/tweet/?text=You%20don%e2%80%99t%20need%20a%20data%20scientist%20%28yet%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f08%2f24%2fyou-dont-need-a-data-scientist-yet%2f&amp;hashtags=business%2cdatabusiness%2cdatascience"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don’t need a data scientist (yet) on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f08%2f24%2fyou-dont-need-a-data-scientist-yet%2f&amp;title=You%20don%e2%80%99t%20need%20a%20data%20scientist%20%28yet%29&amp;summary=You%20don%e2%80%99t%20need%20a%20data%20scientist%20%28yet%29&amp;source=https%3a%2f%2fyanirseroussi.com%2f2015%2f08%2f24%2fyou-dont-need-a-data-scientist-yet%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don’t need a data scientist (yet) on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2015%2f08%2f24%2fyou-dont-need-a-data-scientist-yet%2f&title=You%20don%e2%80%99t%20need%20a%20data%20scientist%20%28yet%29"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don’t need a data scientist (yet) on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2015%2f08%2f24%2fyou-dont-need-a-data-scientist-yet%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don’t need a data scientist (yet) on whatsapp" href="https://api.whatsapp.com/send?text=You%20don%e2%80%99t%20need%20a%20data%20scientist%20%28yet%29%20-%20https%3a%2f%2fyanirseroussi.com%2f2015%2f08%2f24%2fyou-dont-need-a-data-scientist-yet%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don’t need a data scientist (yet) on telegram" href="https://telegram.me/share/url?text=You%20don%e2%80%99t%20need%20a%20data%20scientist%20%28yet%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f08%2f24%2fyou-dont-need-a-data-scientist-yet%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don’t need a data scientist (yet) on ycombinator" href="https://news.ycombinator.com/submitlink?t=You%20don%e2%80%99t%20need%20a%20data%20scientist%20%28yet%29&u=https%3a%2f%2fyanirseroussi.com%2f2015%2f08%2f24%2fyou-dont-need-a-data-scientist-yet%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2015/10/02/the-wonderful-world-of-recommender-systems/index.html b/2015/10/02/the-wonderful-world-of-recommender-systems/index.html
index 5b150e39e..562e75602 100644
--- a/2015/10/02/the-wonderful-world-of-recommender-systems/index.html
+++ b/2015/10/02/the-wonderful-world-of-recommender-systems/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>The wonderful world of recommender systems | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data science,machine learning,predictive modelling,recommender systems,software engineering"><meta name=description content="Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The wonderful world of recommender systems"><meta property="og:description" content="Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/"><meta property="og:image" content="https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-10-02T05:25:57+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe.jpg"><meta name=twitter:title content="The wonderful world of recommender systems"><meta name=twitter:description content="Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"The wonderful world of recommender systems","item":"https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The wonderful world of recommender systems","name":"The wonderful world of recommender systems","description":"Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.","keywords":["data science","machine learning","predictive modelling","recommender systems","software engineering"],"articleBody":"I recently gave a talk about recommender systems at the Data Science Sydney meetup (the slides are available here). This post roughly follows the outline of the talk, expanding on some of the key points in non-slide form (i.e., complete sentences and paragraphs!). The first few sections give a broad overview of the field and the common recommendation paradigms, while the final part is dedicated to debunking five common myths about recommender systems.\nMotivation: Why should we care about recommender systems? The key reason why many people seem to care about recommender systems is money. For companies such as Amazon, Netflix, and Spotify, recommender systems drive significant engagement and revenue. But this is the more cynical view of things. The reason these companies (and others) see increased revenue is because they deliver actual value to their customers – recommender systems provide a scalable way of personalising content for users in scenarios with many items.\nAnother reason why data scientists specifically should care about recommender systems is that it is a true data science problem. That is, at least according to my favourite definition of data science as the intersection between software engineering, machine learning, and statistics. As we will see, building successful recommender systems requires all of these skills (and more).\nDefining recommender systems When trying to the define anything, a reasonable first step is to ask Wikipedia. Unfortunately, as of the day of this post’s publication, Wikipedia defines recommender systems too narrowly, as “a subclass of information filtering system that seek to predict the ‘rating’ or ‘preference’ that a user would give to an item” (I should probably fix it, but this wrong definition helped my talk flow better – let me know if you fix it and I’ll update this paragraph).\nThe problem with Wikipedia’s definition is that there’s so much more to recommender systems than rating prediction. First, recommender is a misnomer – calling it a discovery assistant is better, as the so-called recommendations are far from binding. Second, system means that elements like presentation are important, which is part of what makes recommendation such an interesting data science problem.\nMy definition is simply:\nRecommender systems are systems that help users discover items they may like. Recommendation paradigms Depending on who you ask, there are between two and twenty different recommendation paradigms. The usual classification is by the type of data that is used to generate recommendations. The distinction between approaches is more academic than practical, as it is often a good idea to use hybrids/ensembles to address each method’s limitations. Nonetheless, it is worthwhile discussing the different paradigms. The way I see it, if you ignore trivial approaches that often work surprisingly well (e.g., popular items, and “watch it again”), there are four main paradigms: collaborative filtering, content-based, social/demographic, and contextual recommendation.\nCollaborative filtering is perhaps the most famous approach to recommendation, to the point that it is sometimes seen as synonymous with the field. The main idea is that you’re given a matrix of preferences by users for items, and these are used to predict missing preferences and recommend items with high predictions. One of the key advantages of this approach is that there has been a huge amount of research into collaborative filtering, making it pretty well-understood, with existing libraries that make implementation fairly straightforward. Another important advantage is that collaborative filtering is independent of item properties. All you need to get started is user and item IDs, and some notion of preference by users for items (ratings, views, etc.).\nThe major limitation of collaborative filtering is its reliance on preferences. In a cold-start scenario, where there are no preferences at all, it can’t generate any recommendations. However, cold starts can also occur when there are millions of available preferences, because pure collaborative recommendation doesn’t work for items or users with no ratings, and often performs pretty poorly when there are only a few ratings. Further, the underlying collaborative model may yield disappointing results when the preference matrix is sparse. In fact, this has been my experience in nearly every situation where I deployed collaborative filtering. It always requires tweaking, and never simply works out of the box.\nContent-based algorithms are given user preferences for items, and recommend similar items based on a domain-specific notion of item content. The main advantage of content-based recommendation over collaborative filtering is that it doesn’t require as much user feedback to get going. Even one known user preference can yield many good recommendations (which can lead to the collection of preferences to enable collaborative recommendation). In many scenarios, content-based recommendation is the most natural approach. For example, when recommending news articles or blog posts, it’s natural to compare the textual content of the items. This approach also extends naturally to cases where item metadata is available (e.g., movie stars, book authors, and music genres).\nOne problem with deploying content-based recommendations arises when item similarity is not so easily defined. However, even when it is natural to measure similarity, content-based recommendations may end up being too homogeneous to be useful. Such recommendations may also be too static over time, thereby failing to adjust to changes in individual user tastes and other shifts in the underlying data.\nSocial and demographic recommenders suggest items that are liked by friends, friends of friends, and demographically-similar people. Such recommenders don’t need any preferences by the user to whom recommendations are made, making them very powerful. In my experience, even trivially-implemented approaches can be depressingly accurate. For example, just summing the number of Facebook likes by a person’s close friends can often be enough to paint a pretty accurate picture of what that person likes.\nGiven this power of social and demographic recommenders, it isn’t surprising that social networks don’t easily give their data away. This means that for many practitioners, employing social/demographic recommendation algorithms is simply impossible. However, even when such data is available, it is not always easy to use without creeping users out. Further, privacy concerns need to be carefully addressed to ensure that users are comfortable with using the system.\nContextual recommendation algorithms recommend items that match the user’s current context. This allows them to be more flexible and adaptive to current user needs than methods that ignore context (essentially giving the same weight to all of the user’s history). Hence, contextual algorithms are more likely to elicit a response than approaches that are based only on historical data.\nThe key limitations of contextual recommenders are similar to those of social and demographic recommenders – contextual data may not always be available, and there’s a risk of creeping out the user. For example, ad retargeting can be seen as a form of contextual recommendation that follows users around the web and across devices, without having the explicit consent of the users to being tracked in this manner.\nFive common myths about recommender systems There are some common myths and misconceptions surrounding recommender systems. I’ve picked five to address in this post. If you disagree, agree, or have more to add, I would love to hear from you either privately or in the comment section.\nThe accuracy myth\nOffline optimisation of an accuracy measure is sufficient for creating a successful recommender\nReality\nUsers don't really care about accuracy This is perhaps the most prevalent myth of all, as evidenced by Wikipedia’s definition of recommender systems. It’s somewhat surprising that it still persists, as it’s been almost ten years since McNee et al.’s influential paper on the damage the focus on accuracy measures has done to the field.\nIt is therefore worth asking where this myth came from. My theory is that it is a feedback loop between academia and industry. In academia it is pretty easy to publish papers with infinitesimal improvements to arbitrary accuracy measures on offline datasets (I’m also guilty of doing just that), while it’s relatively hard to run experiments on live systems. However, one of the moves that significantly increased focus on offline predictive accuracy came from industry, in the form of the $1M Netflix prize, where the goal was to improve the accuracy of Netflix’s rating prediction algorithm by 10%.\nNotably, most of the algorithms that came out of the three-year competition were never integrated into Netflix. As discussed on the Netflix blog:\nYou might be wondering what happened with the final Grand Prize ensemble that won the $1M two years later… We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment.\nOur business objective is to maximize member satisfaction and month-to-month subscription retention… Now it is clear that the Netflix Prize objective, accurate prediction of a movie’s rating, is just one of the many components of an effective recommendation system that optimizes our members’ enjoyment.\nThe following chart says it all (taken from the second part of the blog post quoted above):\nAn important question that arises is: If users don’t really care about predictive accuracy, what do they care about? The answer is that predictive accuracy has some importance (as evidenced by the above chart), but it is not the only thing. In my opinion, the key consideration is UI/UX. You can have the most accurate recommendations in the world, but no one would know about it (or care) if they are not served in a timely manner through a friendly interface.\nOf course, even with a great user interface and accurate predictions, there are other issues that require attention when designing recommender systems. Examples include diversity (showing various types of items), serendipity/novelty (showing non-obvious recommendations that users don’t already know about), and coverage (being able to generate recommendations for all users and items). Many other considerations are covered in an excellent survey by Guy Shani and Asela Gunawardana.\nIt’s also worth noting that there is an inherent problem with common accuracy measures. Specifically, when using a measure like root mean square error, a rating prediction algorithm can be made to perform better by reducing errors on low ratings. This is rather pointless, because items with low ratings will not be shown to users in any case.\nFinally, a key issue that arises with offline evaluation is that there are biases in offline datasets that do not necessarily carry over to online scenarios. For instance, in many cases there is an implicit assumption that data is missing at random, when it really isn’t, e.g., the fact that users took the effort to watch and rate a movie already tells us a lot about a bias they have towards this movie (the team that won the Netflix prize used this bias to their advantage). Hiding this rating and trying to predict it is not the same as predicting a rating for a movie that is picked at random from the entire set of movies.\nThe black box myth\nYou can build successful recommender systems without worrying about what's being recommended and how recommendations are being served\nReality\nUI/UX is king, item type is critical A good recommender system has to consider how users interact with the recommendations. For example, the number of displayed recommendations should inform the optimisation procedure (e.g., are you aiming for precision@1 or precision@10?). How these recommendations are laid out (e.g., horizontally/vertically) tends to influence user interaction. In addition, being able to explain the reasons for the recommendations can yield easy wins. Finally, in many cases there are constraints on the amount of time that can be spent generating recommendations.\nIn addition to UI/UX, the design of good recommender systems has to account for what’s being recommended. For example, music tracks and short videos can be played many times, so it’s probably a good idea to recommend items that the user has already seen. On the other hand, items like washing machines and cars don’t get consumed as often. If a user has just bought a washing machine, they’re unlikely to want another one anytime soon (but they may want a dryer or a clothes line).\nHynt is a recommender-system-as-a-service for e-commerce whose development I led up until the middle of last year. The general idea is that merchants simply add a few lines of JavaScript to their shop pages and Hynt does the hard work of recommending relevant items from the store, while considering the user and page context. Going live with Hynt reaffirmed many well-known UI/UX lessons. Most notably:\nAbove the fold is better than below. Engagement with Hynt widgets that were visible without scrolling was higher than those that were lower on the page. More recommendations are better than a few. Hynt widgets are responsive, adapting to the size of the container they’re placed in. Engagement was more likely when more recommendations were displayed, because users were more likely to find something they liked without scrolling through the widget. Fast is better than slow. If recommendations load faster, more people see them, which increases engagement. In Hynt’s case speed was especially important because the widgets load asynchronously after the host page finishes loading. Another important UI/UX element is explanations. Displaying a plausible explanation next to a recommendation can do wonders, without making any changes to the underlying recommendation algorithms. The impact of explanations has been studied extensively by Nava Tintarev and Judith Masthoff. They have identified seven different aims of explanations, which are summarised in the following table (reproduced from their survey of explanations in recommender systems).\nAim Definition Transparency Explain how the system works Scrutability Allow users to tell the system it is wrong Trust Increase user confidence in the system Effectiveness Help users make good decisions Persuasiveness Convince users to try or buy Efficiency Help users make decisions faster Satisfaction Increase ease of usability or enjoyment Explanations are ubiquitous in real-world recommender systems. For example, Amazon uses explanations like “frequently bought together”, and “customers who bought this item also bought”, while Netflix presents different lists of recommendations where each list is driven by a different reason. However, as the following Netflix example shows, it is worth making sure that the explanations you provide don’t make you look stupid.\nThe solved problem myth\nThe space of recommender systems has been exhaustively explored\nReality\nDevelopment of new methods is often required When I finished my PhD, about three years ago, I joined a small startup called Giveable as the first employee (essentially part of the founding team that was formed after Adam Neumann, the original founder, graduated from AngelCube and raised some seed funding). Giveable’s original product was a webapp where users could connect with their Facebook account and find gifts for their friends.\nAt the time, there wasn’t much published research on gift recommendation, and there was more or less nothing about the specific problem of recommending gifts for Facebook friends using liked pages. Here are some of the ways this problem differs from classic recommendation scenarios.\nNeed to consider giver and receiver. Unlike traditional scenarios, the recommended items aren’t consumed by the user to whom they’re shown. In practice, this meant that we had to ensure the items are giftable, and take into account the relationship between the giver and the receiver. For example, the type of gift your mum may give you is different from gifts your partner may give you. Likes are historical, sparse, and often nonsensical. This is best illustrated by an example: What does liking a page such as Tony Abbott – Worst PM in Australian History tell us about gifts the user may like? Tony Abbott is no longer prime minister (thankfully), so it’s historical, and while this page is quite popular, there are many other pages out there that are difficult to interpret and are liked by only a handful of people (this video is a good summary of why Tony is disliked, for those who are unfamiliar with Australian politics). Likes are not for recommended items. As the above example shows, just because you like disliking Tony, it doesn’t exactly lead to useful gifts. Even with things that are more related to interests, such as authors and bands, the liked pages aren’t recommendable as gifts. Likes are not always available offline. This was an important engineering consideration: We didn’t have much time to generate recommendations from the point where a new user gave us permission to view their likes and the likes of their friends. Ideally, recommendation generation would take less than a second from the time we got all the data from Facebook. This puts a strong constraint on the types of algorithms we could use. The key to effectively addressing the Giveable recommendation problem was doing as much processing offline as possible. Specifically:\nSimilar pages were inferred using Latent Dirichlet Allocation (which can be seen as a collaborative filtering technique). This made it possible to use information from pages that are not directly linked to giftable products, e.g., for the above Tony Abbott example, people who dislike him are likely to be left-leaning, which implies many other interests. Facebook pages were matched to giftable products with heuristics + Mechanical Turk + machine learning. This took a few iterations of what was essentially partly-manual semi-supervised learning, where we obtained high-confidence matches through heuristics and manual tagging, and then used this to train a classifier that was used to classify uncertain matches. The results of classification on a hold-out set were then verified through manual tagging of subsamples. We enriched the page and product data with structured information from the Freebase knowledge graph (which has since been deprecated). This allowed us to easily match giftable products to liked pages, e.g., books to authors. The online part included taking a receiver’s liked pages, inferring likes for similar pages, and matching all these pages to a ranked and diversified list of giftable product recommendations. These recommendations came with explanations, which were quite important in this case because the giver of a gift has to know why they’re giving it.\nThe silver bullet myth\nOptimising a single measure or using a single algorithm is sufficient for generating a good recommendation list\nReality\nHybrids work best Netflix provides another example for how focusing on a single algorithm or measure of success is far from sufficient. In a recent blog post, they talk about how they use multiple algorithms to optimise the order of different recommendation lists and each list’s internal ranking, while considering device-specific UI constraints, relevance, engagement, diversity, business requirements, and more.\nAn example from my experience comes from Giveable (which ended up evolving into Hynt), where a single list was generated by mixing the outputs of the following recommendation approaches: contextual, direct likes, inferred likes, content-based, social, collaborative filtering of products, previously viewed items, and popular interests/products. The weight of each algorithm in the mix was static – it was either set manually or through A/B testing, and then left as a hardcoded constant.\nThis kind of static mix can get you very far, but there’s a better way that I haven’t gotten around to implementing before leaving to work on other things. This way is described in a series of posts on bandits for recommenders by Sergey Feldman of RichRelevance. The general idea is to train recommendation models offline using a small number of strategies/paradigms. Online, recommendations are served from strategies that maximise clickthrough and revenue, given a context of features that describe the user, merchant, and web page where the RichRelevance widget is embedded. Rather than setting static weights for the strategies, the bandit model continuously adjusts the weights, while balancing between exploring new strategy weights and exploiting strategies that have been known to work well in a specific context. This allows the overall recommendation engine to adjust to changes in reality and in the underlying data.\nThe omnipresence myth\nEvery personalised system is a recommender system\nReality\nThis one is kinda true, but not necessarily useful... The first conference I attended as a PhD student was the 18th International Conference on User Modeling, Adaptation and Personalization (UMAP), back in 2010. The field of recommender systems was getting increased attention, and Peter Brusilovsky, who has been working in the UMAP field for decades, argued that recommender systems are the new expert systems. This was partly because the hype was causing people to broaden the definition of the field to allow them to say that they’re working on recommender systems.\nI don’t think it’s incorrect that personalisation and recommender systems are different things. However, one problem that this may cause is making people think that common recommendation techniques would apply in scenarios where they’re unlikely to work. For example, web search can be seen as a recommender system for pages that gives a high weight to the user’s intent, as captured by the query. Hence, when personalising web search, it seems sensible to use collaborative filtering techniques. This was indeed my experience with the Yandex search personalisation competition: employing a matrix factorisation approach that was inspired by collaborative filtering turned out to be a waste of time compared to domain-specific methods.\nIn conclusion, recommenders are about as murky as data science. Just like data science, the boundaries of recommender systems are hard to define and they are sometimes over-hyped. This hype may lead to people investing in a recommender system they don’t really need, just like the common issue of premature investment in data science. However, the hype is based on real value, which can definitely be delivered by recommender systems when they are used correctly.\n","wordCount":"3577","inLanguage":"en","image":"https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe.jpg","datePublished":"2015-10-02T05:25:57Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">The wonderful world of recommender systems</h1><div class=post-meta><span title='2015-10-02 05:25:57 +0000 UTC'>October 2, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe_hu04af572edec61288f5f08ad15b2b373c_1725198_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe_hu04af572edec61288f5f08ad15b2b373c_1725198_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe_hu04af572edec61288f5f08ad15b2b373c_1725198_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe_hu04af572edec61288f5f08ad15b2b373c_1725198_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe_hu04af572edec61288f5f08ad15b2b373c_1725198_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe.jpg 4961w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe.jpg alt width=4961 height=2468></figure><div class=post-content><p>I recently gave a talk about recommender systems at the <a href=http://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>Data Science Sydney meetup</a> (the slides are available <a href=http://yanirs.github.io/talks/the-wonderful-world-of-recommender-systems target=_blank rel=noopener>here</a>). This post roughly follows the outline of the talk, expanding on some of the key points in non-slide form (i.e., complete sentences and paragraphs!). The first few sections give a broad overview of the field and the common recommendation paradigms, while the final part is dedicated to debunking five common myths about recommender systems.</p><h3 id=motivation-why-should-we-care-about-recommender-systems>Motivation: Why should we care about recommender systems?<a hidden class=anchor aria-hidden=true href=#motivation-why-should-we-care-about-recommender-systems>#</a></h3><p>The key reason why many people seem to care about recommender systems is <em>money</em>. For companies such as Amazon, Netflix, and Spotify, recommender systems drive significant engagement and revenue. But this is the more cynical view of things. The reason these companies (and others) see increased revenue is because they deliver actual <em>value</em> to their customers – recommender systems provide a scalable way of personalising content for users in scenarios with many items.</p><p>Another reason why data scientists specifically should care about recommender systems is that it is a true data science problem. That is, at least according to <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>my favourite definition of data science</a> as the intersection between software engineering, machine learning, and statistics. As we will see, building successful recommender systems requires all of these skills (and more).</p><h3 id=defining-recommender-systems>Defining recommender systems<a hidden class=anchor aria-hidden=true href=#defining-recommender-systems>#</a></h3><p>When trying to the define anything, a reasonable first step is to ask Wikipedia. Unfortunately, as of the day of this post&rsquo;s publication, <a href=http://en.wikipedia.org/wiki/Recommender_system target=_blank rel=noopener>Wikipedia defines recommender systems too narrowly</a>, as &ldquo;a subclass of information filtering system that seek to predict the ‘rating&rsquo; or ‘preference&rsquo; that a user would give to an item&rdquo; (I should probably fix it, but this wrong definition helped my talk flow better – let me know if you fix it and I&rsquo;ll update this paragraph).</p><p>The problem with Wikipedia&rsquo;s definition is that there&rsquo;s so much more to recommender systems than rating prediction. First, <em>recommender</em> is a misnomer – calling it a discovery assistant is better, as the so-called recommendations are far from binding. Second, <em>system</em> means that elements like presentation are important, which is part of what makes recommendation such an interesting data science problem.</p><p>My definition is simply:</p><p class=highlight-box><i>Recommender systems are systems that help users discover items they may like.</i></p><h3 id=recommendation-paradigms>Recommendation paradigms<a hidden class=anchor aria-hidden=true href=#recommendation-paradigms>#</a></h3><p>Depending on who you ask, there are between two and twenty different recommendation paradigms. The usual classification is by the type of data that is used to generate recommendations. The distinction between approaches is more academic than practical, as it is often a good idea to use hybrids/ensembles to address each method&rsquo;s limitations. Nonetheless, it is worthwhile discussing the different paradigms. The way I see it, if you ignore trivial approaches that often work surprisingly well (e.g., popular items, and &ldquo;watch it again&rdquo;), there are four main paradigms: collaborative filtering, content-based, social/demographic, and contextual recommendation.</p><p><strong>Collaborative filtering</strong> is perhaps the most famous approach to recommendation, to the point that it is sometimes seen as synonymous with the field. The main idea is that you&rsquo;re given a matrix of preferences by users for items, and these are used to predict missing preferences and recommend items with high predictions. One of the key advantages of this approach is that there has been a huge amount of research into collaborative filtering, making it pretty well-understood, with existing libraries that make implementation fairly straightforward. Another important advantage is that collaborative filtering is independent of item properties. All you need to get started is user and item IDs, and some notion of preference by users for items (ratings, views, etc.).</p><p>The major limitation of collaborative filtering is its reliance on preferences. In a cold-start scenario, where there are no preferences at all, it can&rsquo;t generate any recommendations. However, cold starts can also occur when there are millions of available preferences, because pure collaborative recommendation doesn&rsquo;t work for items or users with no ratings, and <a href=https://dl.dropboxusercontent.com/u/25632965/SeroussiBohnertZukerman2011.pdf target=_blank rel=noopener>often performs pretty poorly when there are only a few ratings</a>. Further, the underlying collaborative model may yield disappointing results when the preference matrix is sparse. In fact, this has been my experience in <a href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/>nearly every situation where I deployed collaborative filtering</a>. It always requires tweaking, and never simply works out of the box.</p><p><strong>Content-based</strong> algorithms are given user preferences for items, and recommend similar items based on a domain-specific notion of item content. The main advantage of content-based recommendation over collaborative filtering is that it doesn&rsquo;t require as much user feedback to get going. Even one known user preference can yield many good recommendations (which can lead to the collection of preferences to enable collaborative recommendation). In many scenarios, content-based recommendation is the most natural approach. For example, when recommending news articles or blog posts, it&rsquo;s natural to compare the textual content of the items. This approach also extends naturally to cases where item metadata is available (e.g., movie stars, book authors, and music genres).</p><p>One problem with deploying content-based recommendations arises when item similarity is not so easily defined. However, even when it is natural to measure similarity, content-based recommendations may end up being too homogeneous to be useful. Such recommendations may also be too static over time, thereby failing to adjust to changes in individual user tastes and other shifts in the underlying data.</p><p><strong>Social and demographic</strong> recommenders suggest items that are liked by friends, friends of friends, and demographically-similar people. Such recommenders don&rsquo;t need any preferences by the user to whom recommendations are made, making them very powerful. In my experience, even trivially-implemented approaches can be depressingly accurate. For example, just summing the number of Facebook likes by a person&rsquo;s close friends can often be enough to paint a pretty accurate picture of what that person likes.</p><p>Given this power of social and demographic recommenders, it isn&rsquo;t surprising that social networks don&rsquo;t easily give their data away. This means that for many practitioners, employing social/demographic recommendation algorithms is simply impossible. However, even when such data is available, it is not always easy to use without creeping users out. Further, privacy concerns need to be carefully addressed to ensure that users are comfortable with using the system.</p><p><strong>Contextual</strong> recommendation algorithms recommend items that match the user&rsquo;s current context. This allows them to be more flexible and adaptive to current user needs than methods that ignore context (essentially giving the same weight to all of the user&rsquo;s history). Hence, contextual algorithms are more likely to elicit a response than approaches that are based only on historical data.</p><p>The key limitations of contextual recommenders are similar to those of social and demographic recommenders – contextual data may not always be available, and there&rsquo;s a risk of creeping out the user. For example, <a href=https://en.wikipedia.org/wiki/Behavioral_retargeting target=_blank rel=noopener>ad retargeting</a> can be seen as a form of contextual recommendation that follows users around the web and across devices, without having the explicit consent of the users to being tracked in this manner.</p><h3 id=five-common-myths-about-recommender-systems>Five common myths about recommender systems<a hidden class=anchor aria-hidden=true href=#five-common-myths-about-recommender-systems>#</a></h3><p>There are some common myths and misconceptions surrounding recommender systems. I&rsquo;ve picked five to address in this post. If you disagree, agree, or have more to add, I would love to hear from you either <a href=https://yanirseroussi.com/about/>privately</a> or in the comment section.</p><p class=highlight-box><b>The accuracy myth</b><br>Offline optimisation of an accuracy measure is sufficient for creating a successful recommender<br><b>Reality</b><br>Users don't really care about accuracy</p><p>This is perhaps the most prevalent myth of all, as evidenced by Wikipedia&rsquo;s definition of recommender systems. It&rsquo;s somewhat surprising that it still persists, as it&rsquo;s been almost ten years since <a href="http://dl.acm.org/citation.cfm?id=1125659" target=_blank rel=noopener>McNee et al.&rsquo;s influential paper on the damage the focus on accuracy measures has done to the field</a>.</p><p>It is therefore worth asking where this myth came from. My theory is that it is a feedback loop between academia and industry. In academia it is pretty easy to publish papers with infinitesimal improvements to arbitrary accuracy measures on offline datasets (<a href=https://dl.dropboxusercontent.com/u/25632965/SeroussiBohnertZukerman2011.pdf target=_blank rel=noopener>I&rsquo;m also guilty of doing just that</a>), while it&rsquo;s relatively hard to run experiments on live systems. However, one of the moves that significantly increased focus on offline predictive accuracy came from industry, in the form of the <a href=https://en.wikipedia.org/wiki/Netflix_Prize target=_blank rel=noopener>$1M Netflix prize</a>, where the goal was to improve the accuracy of Netflix&rsquo;s rating prediction algorithm by 10%.</p><p>Notably, most of the algorithms that came out of the three-year competition were never integrated into Netflix. As <a href=http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html target=_blank rel=noopener>discussed on the Netflix blog</a>:</p><blockquote><p>You might be wondering what happened with the final Grand Prize ensemble that won the $1M two years later&mldr; We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment.</p><p>Our business objective is to maximize member satisfaction and month-to-month subscription retention&mldr; Now it is clear that the Netflix Prize objective, accurate prediction of a movie&rsquo;s rating, is just one of the many components of an effective recommendation system that optimizes our members&rsquo; enjoyment.</p></blockquote><p>The following chart says it all (taken from <a href=http://techblog.netflix.com/2012/06/netflix-recommendations-beyond-5-stars.html target=_blank rel=noopener>the second part of the blog post quoted above</a>):</p><figure><a href=netflix-rating-prediction-contribution.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="data science,machine learning,predictive modelling,recommender systems,software engineering"><meta name=description content="Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The wonderful world of recommender systems"><meta property="og:description" content="Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/"><meta property="og:image" content="https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-10-02T05:25:57+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe.jpg"><meta name=twitter:title content="The wonderful world of recommender systems"><meta name=twitter:description content="Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"The wonderful world of recommender systems","item":"https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The wonderful world of recommender systems","name":"The wonderful world of recommender systems","description":"Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.","keywords":["data science","machine learning","predictive modelling","recommender systems","software engineering"],"articleBody":"I recently gave a talk about recommender systems at the Data Science Sydney meetup (the slides are available here). This post roughly follows the outline of the talk, expanding on some of the key points in non-slide form (i.e., complete sentences and paragraphs!). The first few sections give a broad overview of the field and the common recommendation paradigms, while the final part is dedicated to debunking five common myths about recommender systems.\nMotivation: Why should we care about recommender systems? The key reason why many people seem to care about recommender systems is money. For companies such as Amazon, Netflix, and Spotify, recommender systems drive significant engagement and revenue. But this is the more cynical view of things. The reason these companies (and others) see increased revenue is because they deliver actual value to their customers – recommender systems provide a scalable way of personalising content for users in scenarios with many items.\nAnother reason why data scientists specifically should care about recommender systems is that it is a true data science problem. That is, at least according to my favourite definition of data science as the intersection between software engineering, machine learning, and statistics. As we will see, building successful recommender systems requires all of these skills (and more).\nDefining recommender systems When trying to the define anything, a reasonable first step is to ask Wikipedia. Unfortunately, as of the day of this post’s publication, Wikipedia defines recommender systems too narrowly, as “a subclass of information filtering system that seek to predict the ‘rating’ or ‘preference’ that a user would give to an item” (I should probably fix it, but this wrong definition helped my talk flow better – let me know if you fix it and I’ll update this paragraph).\nThe problem with Wikipedia’s definition is that there’s so much more to recommender systems than rating prediction. First, recommender is a misnomer – calling it a discovery assistant is better, as the so-called recommendations are far from binding. Second, system means that elements like presentation are important, which is part of what makes recommendation such an interesting data science problem.\nMy definition is simply:\nRecommender systems are systems that help users discover items they may like. Recommendation paradigms Depending on who you ask, there are between two and twenty different recommendation paradigms. The usual classification is by the type of data that is used to generate recommendations. The distinction between approaches is more academic than practical, as it is often a good idea to use hybrids/ensembles to address each method’s limitations. Nonetheless, it is worthwhile discussing the different paradigms. The way I see it, if you ignore trivial approaches that often work surprisingly well (e.g., popular items, and “watch it again”), there are four main paradigms: collaborative filtering, content-based, social/demographic, and contextual recommendation.\nCollaborative filtering is perhaps the most famous approach to recommendation, to the point that it is sometimes seen as synonymous with the field. The main idea is that you’re given a matrix of preferences by users for items, and these are used to predict missing preferences and recommend items with high predictions. One of the key advantages of this approach is that there has been a huge amount of research into collaborative filtering, making it pretty well-understood, with existing libraries that make implementation fairly straightforward. Another important advantage is that collaborative filtering is independent of item properties. All you need to get started is user and item IDs, and some notion of preference by users for items (ratings, views, etc.).\nThe major limitation of collaborative filtering is its reliance on preferences. In a cold-start scenario, where there are no preferences at all, it can’t generate any recommendations. However, cold starts can also occur when there are millions of available preferences, because pure collaborative recommendation doesn’t work for items or users with no ratings, and often performs pretty poorly when there are only a few ratings. Further, the underlying collaborative model may yield disappointing results when the preference matrix is sparse. In fact, this has been my experience in nearly every situation where I deployed collaborative filtering. It always requires tweaking, and never simply works out of the box.\nContent-based algorithms are given user preferences for items, and recommend similar items based on a domain-specific notion of item content. The main advantage of content-based recommendation over collaborative filtering is that it doesn’t require as much user feedback to get going. Even one known user preference can yield many good recommendations (which can lead to the collection of preferences to enable collaborative recommendation). In many scenarios, content-based recommendation is the most natural approach. For example, when recommending news articles or blog posts, it’s natural to compare the textual content of the items. This approach also extends naturally to cases where item metadata is available (e.g., movie stars, book authors, and music genres).\nOne problem with deploying content-based recommendations arises when item similarity is not so easily defined. However, even when it is natural to measure similarity, content-based recommendations may end up being too homogeneous to be useful. Such recommendations may also be too static over time, thereby failing to adjust to changes in individual user tastes and other shifts in the underlying data.\nSocial and demographic recommenders suggest items that are liked by friends, friends of friends, and demographically-similar people. Such recommenders don’t need any preferences by the user to whom recommendations are made, making them very powerful. In my experience, even trivially-implemented approaches can be depressingly accurate. For example, just summing the number of Facebook likes by a person’s close friends can often be enough to paint a pretty accurate picture of what that person likes.\nGiven this power of social and demographic recommenders, it isn’t surprising that social networks don’t easily give their data away. This means that for many practitioners, employing social/demographic recommendation algorithms is simply impossible. However, even when such data is available, it is not always easy to use without creeping users out. Further, privacy concerns need to be carefully addressed to ensure that users are comfortable with using the system.\nContextual recommendation algorithms recommend items that match the user’s current context. This allows them to be more flexible and adaptive to current user needs than methods that ignore context (essentially giving the same weight to all of the user’s history). Hence, contextual algorithms are more likely to elicit a response than approaches that are based only on historical data.\nThe key limitations of contextual recommenders are similar to those of social and demographic recommenders – contextual data may not always be available, and there’s a risk of creeping out the user. For example, ad retargeting can be seen as a form of contextual recommendation that follows users around the web and across devices, without having the explicit consent of the users to being tracked in this manner.\nFive common myths about recommender systems There are some common myths and misconceptions surrounding recommender systems. I’ve picked five to address in this post. If you disagree, agree, or have more to add, I would love to hear from you either privately or in the comment section.\nThe accuracy myth\nOffline optimisation of an accuracy measure is sufficient for creating a successful recommender\nReality\nUsers don't really care about accuracy This is perhaps the most prevalent myth of all, as evidenced by Wikipedia’s definition of recommender systems. It’s somewhat surprising that it still persists, as it’s been almost ten years since McNee et al.’s influential paper on the damage the focus on accuracy measures has done to the field.\nIt is therefore worth asking where this myth came from. My theory is that it is a feedback loop between academia and industry. In academia it is pretty easy to publish papers with infinitesimal improvements to arbitrary accuracy measures on offline datasets (I’m also guilty of doing just that), while it’s relatively hard to run experiments on live systems. However, one of the moves that significantly increased focus on offline predictive accuracy came from industry, in the form of the $1M Netflix prize, where the goal was to improve the accuracy of Netflix’s rating prediction algorithm by 10%.\nNotably, most of the algorithms that came out of the three-year competition were never integrated into Netflix. As discussed on the Netflix blog:\nYou might be wondering what happened with the final Grand Prize ensemble that won the $1M two years later… We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment.\nOur business objective is to maximize member satisfaction and month-to-month subscription retention… Now it is clear that the Netflix Prize objective, accurate prediction of a movie’s rating, is just one of the many components of an effective recommendation system that optimizes our members’ enjoyment.\nThe following chart says it all (taken from the second part of the blog post quoted above):\nAn important question that arises is: If users don’t really care about predictive accuracy, what do they care about? The answer is that predictive accuracy has some importance (as evidenced by the above chart), but it is not the only thing. In my opinion, the key consideration is UI/UX. You can have the most accurate recommendations in the world, but no one would know about it (or care) if they are not served in a timely manner through a friendly interface.\nOf course, even with a great user interface and accurate predictions, there are other issues that require attention when designing recommender systems. Examples include diversity (showing various types of items), serendipity/novelty (showing non-obvious recommendations that users don’t already know about), and coverage (being able to generate recommendations for all users and items). Many other considerations are covered in an excellent survey by Guy Shani and Asela Gunawardana.\nIt’s also worth noting that there is an inherent problem with common accuracy measures. Specifically, when using a measure like root mean square error, a rating prediction algorithm can be made to perform better by reducing errors on low ratings. This is rather pointless, because items with low ratings will not be shown to users in any case.\nFinally, a key issue that arises with offline evaluation is that there are biases in offline datasets that do not necessarily carry over to online scenarios. For instance, in many cases there is an implicit assumption that data is missing at random, when it really isn’t, e.g., the fact that users took the effort to watch and rate a movie already tells us a lot about a bias they have towards this movie (the team that won the Netflix prize used this bias to their advantage). Hiding this rating and trying to predict it is not the same as predicting a rating for a movie that is picked at random from the entire set of movies.\nThe black box myth\nYou can build successful recommender systems without worrying about what's being recommended and how recommendations are being served\nReality\nUI/UX is king, item type is critical A good recommender system has to consider how users interact with the recommendations. For example, the number of displayed recommendations should inform the optimisation procedure (e.g., are you aiming for precision@1 or precision@10?). How these recommendations are laid out (e.g., horizontally/vertically) tends to influence user interaction. In addition, being able to explain the reasons for the recommendations can yield easy wins. Finally, in many cases there are constraints on the amount of time that can be spent generating recommendations.\nIn addition to UI/UX, the design of good recommender systems has to account for what’s being recommended. For example, music tracks and short videos can be played many times, so it’s probably a good idea to recommend items that the user has already seen. On the other hand, items like washing machines and cars don’t get consumed as often. If a user has just bought a washing machine, they’re unlikely to want another one anytime soon (but they may want a dryer or a clothes line).\nHynt is a recommender-system-as-a-service for e-commerce whose development I led up until the middle of last year. The general idea is that merchants simply add a few lines of JavaScript to their shop pages and Hynt does the hard work of recommending relevant items from the store, while considering the user and page context. Going live with Hynt reaffirmed many well-known UI/UX lessons. Most notably:\nAbove the fold is better than below. Engagement with Hynt widgets that were visible without scrolling was higher than those that were lower on the page. More recommendations are better than a few. Hynt widgets are responsive, adapting to the size of the container they’re placed in. Engagement was more likely when more recommendations were displayed, because users were more likely to find something they liked without scrolling through the widget. Fast is better than slow. If recommendations load faster, more people see them, which increases engagement. In Hynt’s case speed was especially important because the widgets load asynchronously after the host page finishes loading. Another important UI/UX element is explanations. Displaying a plausible explanation next to a recommendation can do wonders, without making any changes to the underlying recommendation algorithms. The impact of explanations has been studied extensively by Nava Tintarev and Judith Masthoff. They have identified seven different aims of explanations, which are summarised in the following table (reproduced from their survey of explanations in recommender systems).\nAim Definition Transparency Explain how the system works Scrutability Allow users to tell the system it is wrong Trust Increase user confidence in the system Effectiveness Help users make good decisions Persuasiveness Convince users to try or buy Efficiency Help users make decisions faster Satisfaction Increase ease of usability or enjoyment Explanations are ubiquitous in real-world recommender systems. For example, Amazon uses explanations like “frequently bought together”, and “customers who bought this item also bought”, while Netflix presents different lists of recommendations where each list is driven by a different reason. However, as the following Netflix example shows, it is worth making sure that the explanations you provide don’t make you look stupid.\nThe solved problem myth\nThe space of recommender systems has been exhaustively explored\nReality\nDevelopment of new methods is often required When I finished my PhD, about three years ago, I joined a small startup called Giveable as the first employee (essentially part of the founding team that was formed after Adam Neumann, the original founder, graduated from AngelCube and raised some seed funding). Giveable’s original product was a webapp where users could connect with their Facebook account and find gifts for their friends.\nAt the time, there wasn’t much published research on gift recommendation, and there was more or less nothing about the specific problem of recommending gifts for Facebook friends using liked pages. Here are some of the ways this problem differs from classic recommendation scenarios.\nNeed to consider giver and receiver. Unlike traditional scenarios, the recommended items aren’t consumed by the user to whom they’re shown. In practice, this meant that we had to ensure the items are giftable, and take into account the relationship between the giver and the receiver. For example, the type of gift your mum may give you is different from gifts your partner may give you. Likes are historical, sparse, and often nonsensical. This is best illustrated by an example: What does liking a page such as Tony Abbott – Worst PM in Australian History tell us about gifts the user may like? Tony Abbott is no longer prime minister (thankfully), so it’s historical, and while this page is quite popular, there are many other pages out there that are difficult to interpret and are liked by only a handful of people (this video is a good summary of why Tony is disliked, for those who are unfamiliar with Australian politics). Likes are not for recommended items. As the above example shows, just because you like disliking Tony, it doesn’t exactly lead to useful gifts. Even with things that are more related to interests, such as authors and bands, the liked pages aren’t recommendable as gifts. Likes are not always available offline. This was an important engineering consideration: We didn’t have much time to generate recommendations from the point where a new user gave us permission to view their likes and the likes of their friends. Ideally, recommendation generation would take less than a second from the time we got all the data from Facebook. This puts a strong constraint on the types of algorithms we could use. The key to effectively addressing the Giveable recommendation problem was doing as much processing offline as possible. Specifically:\nSimilar pages were inferred using Latent Dirichlet Allocation (which can be seen as a collaborative filtering technique). This made it possible to use information from pages that are not directly linked to giftable products, e.g., for the above Tony Abbott example, people who dislike him are likely to be left-leaning, which implies many other interests. Facebook pages were matched to giftable products with heuristics + Mechanical Turk + machine learning. This took a few iterations of what was essentially partly-manual semi-supervised learning, where we obtained high-confidence matches through heuristics and manual tagging, and then used this to train a classifier that was used to classify uncertain matches. The results of classification on a hold-out set were then verified through manual tagging of subsamples. We enriched the page and product data with structured information from the Freebase knowledge graph (which has since been deprecated). This allowed us to easily match giftable products to liked pages, e.g., books to authors. The online part included taking a receiver’s liked pages, inferring likes for similar pages, and matching all these pages to a ranked and diversified list of giftable product recommendations. These recommendations came with explanations, which were quite important in this case because the giver of a gift has to know why they’re giving it.\nThe silver bullet myth\nOptimising a single measure or using a single algorithm is sufficient for generating a good recommendation list\nReality\nHybrids work best Netflix provides another example for how focusing on a single algorithm or measure of success is far from sufficient. In a recent blog post, they talk about how they use multiple algorithms to optimise the order of different recommendation lists and each list’s internal ranking, while considering device-specific UI constraints, relevance, engagement, diversity, business requirements, and more.\nAn example from my experience comes from Giveable (which ended up evolving into Hynt), where a single list was generated by mixing the outputs of the following recommendation approaches: contextual, direct likes, inferred likes, content-based, social, collaborative filtering of products, previously viewed items, and popular interests/products. The weight of each algorithm in the mix was static – it was either set manually or through A/B testing, and then left as a hardcoded constant.\nThis kind of static mix can get you very far, but there’s a better way that I haven’t gotten around to implementing before leaving to work on other things. This way is described in a series of posts on bandits for recommenders by Sergey Feldman of RichRelevance. The general idea is to train recommendation models offline using a small number of strategies/paradigms. Online, recommendations are served from strategies that maximise clickthrough and revenue, given a context of features that describe the user, merchant, and web page where the RichRelevance widget is embedded. Rather than setting static weights for the strategies, the bandit model continuously adjusts the weights, while balancing between exploring new strategy weights and exploiting strategies that have been known to work well in a specific context. This allows the overall recommendation engine to adjust to changes in reality and in the underlying data.\nThe omnipresence myth\nEvery personalised system is a recommender system\nReality\nThis one is kinda true, but not necessarily useful... The first conference I attended as a PhD student was the 18th International Conference on User Modeling, Adaptation and Personalization (UMAP), back in 2010. The field of recommender systems was getting increased attention, and Peter Brusilovsky, who has been working in the UMAP field for decades, argued that recommender systems are the new expert systems. This was partly because the hype was causing people to broaden the definition of the field to allow them to say that they’re working on recommender systems.\nI don’t think it’s incorrect that personalisation and recommender systems are different things. However, one problem that this may cause is making people think that common recommendation techniques would apply in scenarios where they’re unlikely to work. For example, web search can be seen as a recommender system for pages that gives a high weight to the user’s intent, as captured by the query. Hence, when personalising web search, it seems sensible to use collaborative filtering techniques. This was indeed my experience with the Yandex search personalisation competition: employing a matrix factorisation approach that was inspired by collaborative filtering turned out to be a waste of time compared to domain-specific methods.\nIn conclusion, recommenders are about as murky as data science. Just like data science, the boundaries of recommender systems are hard to define and they are sometimes over-hyped. This hype may lead to people investing in a recommender system they don’t really need, just like the common issue of premature investment in data science. However, the hype is based on real value, which can definitely be delivered by recommender systems when they are used correctly.\n","wordCount":"3577","inLanguage":"en","image":"https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe.jpg","datePublished":"2015-10-02T05:25:57Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">The wonderful world of recommender systems</h1><div class=post-meta><span title='2015-10-02 05:25:57 +0000 UTC'>October 2, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe_hu04af572edec61288f5f08ad15b2b373c_1725198_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe_hu04af572edec61288f5f08ad15b2b373c_1725198_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe_hu04af572edec61288f5f08ad15b2b373c_1725198_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe_hu04af572edec61288f5f08ad15b2b373c_1725198_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe_hu04af572edec61288f5f08ad15b2b373c_1725198_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe.jpg 4961w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/recommender-universe.jpg alt width=4961 height=2468></figure><div class=post-content><p>I recently gave a talk about recommender systems at the <a href=http://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>Data Science Sydney meetup</a> (the slides are available <a href=http://yanirs.github.io/talks/the-wonderful-world-of-recommender-systems target=_blank rel=noopener>here</a>). This post roughly follows the outline of the talk, expanding on some of the key points in non-slide form (i.e., complete sentences and paragraphs!). The first few sections give a broad overview of the field and the common recommendation paradigms, while the final part is dedicated to debunking five common myths about recommender systems.</p><h3 id=motivation-why-should-we-care-about-recommender-systems>Motivation: Why should we care about recommender systems?<a hidden class=anchor aria-hidden=true href=#motivation-why-should-we-care-about-recommender-systems>#</a></h3><p>The key reason why many people seem to care about recommender systems is <em>money</em>. For companies such as Amazon, Netflix, and Spotify, recommender systems drive significant engagement and revenue. But this is the more cynical view of things. The reason these companies (and others) see increased revenue is because they deliver actual <em>value</em> to their customers – recommender systems provide a scalable way of personalising content for users in scenarios with many items.</p><p>Another reason why data scientists specifically should care about recommender systems is that it is a true data science problem. That is, at least according to <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>my favourite definition of data science</a> as the intersection between software engineering, machine learning, and statistics. As we will see, building successful recommender systems requires all of these skills (and more).</p><h3 id=defining-recommender-systems>Defining recommender systems<a hidden class=anchor aria-hidden=true href=#defining-recommender-systems>#</a></h3><p>When trying to the define anything, a reasonable first step is to ask Wikipedia. Unfortunately, as of the day of this post&rsquo;s publication, <a href=http://en.wikipedia.org/wiki/Recommender_system target=_blank rel=noopener>Wikipedia defines recommender systems too narrowly</a>, as &ldquo;a subclass of information filtering system that seek to predict the ‘rating&rsquo; or ‘preference&rsquo; that a user would give to an item&rdquo; (I should probably fix it, but this wrong definition helped my talk flow better – let me know if you fix it and I&rsquo;ll update this paragraph).</p><p>The problem with Wikipedia&rsquo;s definition is that there&rsquo;s so much more to recommender systems than rating prediction. First, <em>recommender</em> is a misnomer – calling it a discovery assistant is better, as the so-called recommendations are far from binding. Second, <em>system</em> means that elements like presentation are important, which is part of what makes recommendation such an interesting data science problem.</p><p>My definition is simply:</p><p class=highlight-box><i>Recommender systems are systems that help users discover items they may like.</i></p><h3 id=recommendation-paradigms>Recommendation paradigms<a hidden class=anchor aria-hidden=true href=#recommendation-paradigms>#</a></h3><p>Depending on who you ask, there are between two and twenty different recommendation paradigms. The usual classification is by the type of data that is used to generate recommendations. The distinction between approaches is more academic than practical, as it is often a good idea to use hybrids/ensembles to address each method&rsquo;s limitations. Nonetheless, it is worthwhile discussing the different paradigms. The way I see it, if you ignore trivial approaches that often work surprisingly well (e.g., popular items, and &ldquo;watch it again&rdquo;), there are four main paradigms: collaborative filtering, content-based, social/demographic, and contextual recommendation.</p><p><strong>Collaborative filtering</strong> is perhaps the most famous approach to recommendation, to the point that it is sometimes seen as synonymous with the field. The main idea is that you&rsquo;re given a matrix of preferences by users for items, and these are used to predict missing preferences and recommend items with high predictions. One of the key advantages of this approach is that there has been a huge amount of research into collaborative filtering, making it pretty well-understood, with existing libraries that make implementation fairly straightforward. Another important advantage is that collaborative filtering is independent of item properties. All you need to get started is user and item IDs, and some notion of preference by users for items (ratings, views, etc.).</p><p>The major limitation of collaborative filtering is its reliance on preferences. In a cold-start scenario, where there are no preferences at all, it can&rsquo;t generate any recommendations. However, cold starts can also occur when there are millions of available preferences, because pure collaborative recommendation doesn&rsquo;t work for items or users with no ratings, and <a href=https://dl.dropboxusercontent.com/u/25632965/SeroussiBohnertZukerman2011.pdf target=_blank rel=noopener>often performs pretty poorly when there are only a few ratings</a>. Further, the underlying collaborative model may yield disappointing results when the preference matrix is sparse. In fact, this has been my experience in <a href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/>nearly every situation where I deployed collaborative filtering</a>. It always requires tweaking, and never simply works out of the box.</p><p><strong>Content-based</strong> algorithms are given user preferences for items, and recommend similar items based on a domain-specific notion of item content. The main advantage of content-based recommendation over collaborative filtering is that it doesn&rsquo;t require as much user feedback to get going. Even one known user preference can yield many good recommendations (which can lead to the collection of preferences to enable collaborative recommendation). In many scenarios, content-based recommendation is the most natural approach. For example, when recommending news articles or blog posts, it&rsquo;s natural to compare the textual content of the items. This approach also extends naturally to cases where item metadata is available (e.g., movie stars, book authors, and music genres).</p><p>One problem with deploying content-based recommendations arises when item similarity is not so easily defined. However, even when it is natural to measure similarity, content-based recommendations may end up being too homogeneous to be useful. Such recommendations may also be too static over time, thereby failing to adjust to changes in individual user tastes and other shifts in the underlying data.</p><p><strong>Social and demographic</strong> recommenders suggest items that are liked by friends, friends of friends, and demographically-similar people. Such recommenders don&rsquo;t need any preferences by the user to whom recommendations are made, making them very powerful. In my experience, even trivially-implemented approaches can be depressingly accurate. For example, just summing the number of Facebook likes by a person&rsquo;s close friends can often be enough to paint a pretty accurate picture of what that person likes.</p><p>Given this power of social and demographic recommenders, it isn&rsquo;t surprising that social networks don&rsquo;t easily give their data away. This means that for many practitioners, employing social/demographic recommendation algorithms is simply impossible. However, even when such data is available, it is not always easy to use without creeping users out. Further, privacy concerns need to be carefully addressed to ensure that users are comfortable with using the system.</p><p><strong>Contextual</strong> recommendation algorithms recommend items that match the user&rsquo;s current context. This allows them to be more flexible and adaptive to current user needs than methods that ignore context (essentially giving the same weight to all of the user&rsquo;s history). Hence, contextual algorithms are more likely to elicit a response than approaches that are based only on historical data.</p><p>The key limitations of contextual recommenders are similar to those of social and demographic recommenders – contextual data may not always be available, and there&rsquo;s a risk of creeping out the user. For example, <a href=https://en.wikipedia.org/wiki/Behavioral_retargeting target=_blank rel=noopener>ad retargeting</a> can be seen as a form of contextual recommendation that follows users around the web and across devices, without having the explicit consent of the users to being tracked in this manner.</p><h3 id=five-common-myths-about-recommender-systems>Five common myths about recommender systems<a hidden class=anchor aria-hidden=true href=#five-common-myths-about-recommender-systems>#</a></h3><p>There are some common myths and misconceptions surrounding recommender systems. I&rsquo;ve picked five to address in this post. If you disagree, agree, or have more to add, I would love to hear from you either <a href=https://yanirseroussi.com/about/>privately</a> or in the comment section.</p><p class=highlight-box><b>The accuracy myth</b><br>Offline optimisation of an accuracy measure is sufficient for creating a successful recommender<br><b>Reality</b><br>Users don't really care about accuracy</p><p>This is perhaps the most prevalent myth of all, as evidenced by Wikipedia&rsquo;s definition of recommender systems. It&rsquo;s somewhat surprising that it still persists, as it&rsquo;s been almost ten years since <a href="http://dl.acm.org/citation.cfm?id=1125659" target=_blank rel=noopener>McNee et al.&rsquo;s influential paper on the damage the focus on accuracy measures has done to the field</a>.</p><p>It is therefore worth asking where this myth came from. My theory is that it is a feedback loop between academia and industry. In academia it is pretty easy to publish papers with infinitesimal improvements to arbitrary accuracy measures on offline datasets (<a href=https://dl.dropboxusercontent.com/u/25632965/SeroussiBohnertZukerman2011.pdf target=_blank rel=noopener>I&rsquo;m also guilty of doing just that</a>), while it&rsquo;s relatively hard to run experiments on live systems. However, one of the moves that significantly increased focus on offline predictive accuracy came from industry, in the form of the <a href=https://en.wikipedia.org/wiki/Netflix_Prize target=_blank rel=noopener>$1M Netflix prize</a>, where the goal was to improve the accuracy of Netflix&rsquo;s rating prediction algorithm by 10%.</p><p>Notably, most of the algorithms that came out of the three-year competition were never integrated into Netflix. As <a href=http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html target=_blank rel=noopener>discussed on the Netflix blog</a>:</p><blockquote><p>You might be wondering what happened with the final Grand Prize ensemble that won the $1M two years later&mldr; We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment.</p><p>Our business objective is to maximize member satisfaction and month-to-month subscription retention&mldr; Now it is clear that the Netflix Prize objective, accurate prediction of a movie&rsquo;s rating, is just one of the many components of an effective recommendation system that optimizes our members&rsquo; enjoyment.</p></blockquote><p>The following chart says it all (taken from <a href=http://techblog.netflix.com/2012/06/netflix-recommendations-beyond-5-stars.html target=_blank rel=noopener>the second part of the blog post quoted above</a>):</p><figure><a href=netflix-rating-prediction-contribution.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/netflix-rating-prediction-contribution_hu4d59641bac20d42b471a6b4658d17e86_9091_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/netflix-rating-prediction-contribution_hu4d59641bac20d42b471a6b4658d17e86_9091_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/netflix-rating-prediction-contribution_hu4d59641bac20d42b471a6b4658d17e86_9091_720x0_resize_box_3.png 720w,
diff --git a/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/index.html b/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/index.html
index 81bd266b5..ef11928fb 100644
--- a/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/index.html
+++ b/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data business,data science,health,machine learning,nutrition,nutritionism,predictive modelling"><meta name=description content="Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling"><meta property="og:description" content="Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/"><meta property="og:image" content="https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-10-19T00:02:32+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish.jpg"><meta name=twitter:title content="Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling"><meta name=twitter:description content="Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling","item":"https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling","name":"Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling","description":"Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.","keywords":["data business","data science","health","machine learning","nutrition","nutritionism","predictive modelling"],"articleBody":"I recently finished reading the book In Defense of Food: An Eater’s Manifesto by Michael Pollan. The book criticises nutritionism – the idea that one should eat according to the sum of measured nutrients while ignoring the food that contains these nutrients. The key argument of the book is that since the knowledge derived using food science is still very limited, completely relying on the partial findings and tools provided by this science is likely to lead to health issues. Instead, the author says we should “Eat food. Not too much. Mostly plants.” One of the reasons I found the book interesting is that nutritionism is a special case of misinterpretation and miscommunication of scientific results. This is something many data scientists encounter in their everyday work – finding the balance between simple and complex models, the need to “sell” models and their results to non-technical stakeholders, and the requirement for well-performing models. This post explores these issues through the example of predicting human health based on diet.\nAs an aside, I generally agree with the book’s message, which is backed by fairly thorough research (though it is a bit dated, as the book was released in 2008). There are many commercial interests invested in persuading us to eat things that may be edible, but shouldn’t really be considered food. These food-like products tend to rely on health claims that dumb down the science. A common example can be found in various fat-free products, where healthy fat is replaced with unhealthy amounts of sugar to compensate for the loss of flavour. These products are then marketed as healthy due to their lack of fat. The book is full of such examples, and is definitely worth reading, especially if you live in the US or in a country that’s heavily influenced by American food culture.\nRunning example: Predicting a person’s health based on their diet Predicting health based on diet isn’t an easy problem. First, how do you quantify and measure health? You could use proxies like longevity and occurrence/duration of disease, but these are imperfect measures because you can have a long unhealthy life (thanks to modern medicine) and some diseases are more unbearable than others. Another issue is that there are many factors other than diet that contribute to health, such as genetics, age, lifestyle, access to healthcare, etc. Finally, even if you could reliably study the effect of diet in isolation from other factors, there’s the question of measuring the diet. Do you measure each nutrient separately or do you look at foods and consumption patterns? Do you group foods by time (e.g., looking at overall daily or monthly patterns)? If you just looked at the raw data of foods and nutrients consumed at certain points in time, every studied subject is likely to be an outlier (due to the curse of dimensionality). The raw data on foods consumed by individuals has to be grouped in some way to build a generalisable model, but groupings necessitate removal of some data.\nModelling real-world data is rarely straightforward. Many assumptions are embedded in the measurements and models. Good scientific papers are explicit about the shortcomings and limitations of the presented work. However, by the time scientific studies make it to the real world, shortcomings and limitations are removed to present palatable (and often wrong) conclusions to a general audience. This is illustrated nicely by the following comic:\nSelling your model with simple explanations People like simple explanations for complex phenomena. If you work as a data scientist, or if you are planning to become/hire one, you’ve probably seen storytelling listed as one of the key skills that data scientists should have. Unlike “real” scientists that work in academia and have to explain their results mostly to peers who can handle technical complexities, data scientists in industry have to deal with non-technical stakeholders who want to understand how the models work. However, these stakeholders rarely have the time or patience to understand how things truly work. What they want is a simple hand-wavy explanation to make them feel as if they understand the matter – they want a story, not a technical report (an aside: don’t feel too smug, there is a lot of knowledge out there and in matters that fall outside of our main interests we are all non-technical stakeholders who get fed simple stories).\nOne of the simplest stories that most people can understand is the story of correlation. Going back to the running example of predicting health based on diet, it is well-known that excessive consumption of certain fats under certain conditions is correlated with an increase in likelihood of certain diseases. This is simplified in some stories to “consuming more fat increases your chance of disease”, which leads to the conclusion that consuming no fat at all decreases the chance of disease to zero. While this may sound ridiculous, it’s the sad reality. According to a recent survey, while the image of fat has improved over the past few years, 42% of Americans still try to limit or avoid all fats.\nA slightly more involved story is that of linear models – looking at the effect of the most important factors, rather than presenting a single factor’s contribution. This storytelling technique is commonly used even with non-linear models, where the most important features are identified using various techniques. The problem is that people still tend to interpret this form of presentation as a simple linear relationship. Expanding on the previous example, this approach goes from a single-minded focus on fat to the need to consume less fat and sugar, but more calcium, protein and vitamin D. Unfortunately, even linear models with tens of variables are hard for people to use and follow. In the case of nutrition, few people really track the intake of all the nutrients covered by recommended daily intakes.\nFew interesting relationships are linear Complex phenomena tend to be explained by complex non-linear models. For example, it’s not enough to consume the “right” amount of calcium – you also need vitamin D to absorb it, but popping a few vitamin D pills isn’t going to work well if you don’t consume them with fat, though over-consumption of certain fats is likely to lead to health issues. This list of human-friendly rules can go on and on, but reality is much more complex. It is naive to think that it is possible to predict something as complex as human health with a simple linear model that is based on daily nutrient intake. That being said, some relationships do lend themselves to simple rules of thumb. For example, if you don’t have enough vitamin C, you’re very likely to get scurvy, and people who don’t consume enough vitamin B1 may contract beriberi. However, when it comes to cancers and other diseases that take years to develop, linear models are inadequate.\nAn accurate model to predict human health based on diet would be based on thousands to millions of variables, and would consider many non-linear relationships. It is fairly safe to assume that there is no magic bullet that simply explains how diet affects our health, and no superfood is going to save us from the complexity of our nutritional needs. It is likely that even if we had such a model, it would not be completely accurate. All models are wrong, but some models are useful. For example, the vitamin C versus scurvy model is very useful, but it is often wrong when it comes to predicting overall health. Predictions made by useful complex models can be very hard to reason about and explain, but it doesn’t mean we shouldn’t use them.\nThe ongoing quest for sellable complex models All of the above should be pretty obvious to any modern data scientist. The culture of preferring complex models with high predictive accuracy to simplistic models with questionable predictive power is now prevalent (see Leo Breiman’s 2001 paper for a discussion of these two cultures of statistical modelling). This is illustrated by the focus of many Kaggle competitions on producing accurate models and the recent successes of deep learning for computer vision. Especially with deep learning for vision, no one expects a handful of variables (pixels) to be predictive, so traditional explanations of variable importance are useless. This does lead to a general suspicion of such models, as they are too complex for us to reason about or fully explain. However, it is very hard to argue with the empirical success of accurate modelling techniques.\nNonetheless, many data scientists still work in environments that require simple explanations. This may lead some data scientists to settle for simple models that are easier to sell. In my opinion, it is better to make up a simple explanation for an accurate complex model than settle for a simple model that doesn’t really work. That being said, some situations do call for simple or inflexible models due to a lack of data or the need to enforce strong prior assumptions. In Albert Einstein’s words, “it can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience”. Make things as simple as possible, but not simpler, and always consider the interests of people who try to sell you simplistic (or unnecessarily complex) explanations.\n","wordCount":"1569","inLanguage":"en","image":"https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish.jpg","datePublished":"2015-10-19T00:02:32Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling</h1><div class=post-meta><span title='2015-10-19 00:02:32 +0000 UTC'>October 19, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish_huc4da638d7ca6a4b8f50402f897bdb27b_209676_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish_huc4da638d7ca6a4b8f50402f897bdb27b_209676_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish_huc4da638d7ca6a4b8f50402f897bdb27b_209676_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish_huc4da638d7ca6a4b8f50402f897bdb27b_209676_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish.jpg 1275w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish.jpg alt width=1275 height=553></figure><div class=post-content><p>I recently finished reading the book <a href=http://michaelpollan.com/books/in-defense-of-food/ target=_blank rel=noopener>In Defense of Food: An Eater&rsquo;s Manifesto</a> by Michael Pollan. The book criticises <a href=https://en.wikipedia.org/wiki/Nutritionism target=_blank rel=noopener>nutritionism</a> – the idea that one should eat according to the sum of measured nutrients while ignoring the food that contains these nutrients. The key argument of the book is that since the knowledge derived using food science is still very limited, completely relying on the partial findings and tools provided by this science is likely to lead to health issues. Instead, the author says we should &ldquo;<em>Eat food. Not too much. Mostly plants.</em>&rdquo; One of the reasons I found the book interesting is that nutritionism is a special case of misinterpretation and miscommunication of scientific results. This is something many data scientists encounter in their everyday work – finding the balance between simple and complex models, the need to &ldquo;sell&rdquo; models and their results to non-technical stakeholders, and the requirement for well-performing models. This post explores these issues through the example of predicting human health based on diet.</p><p>As an aside, I generally agree with the book&rsquo;s message, which is backed by fairly thorough research (though it is a bit dated, as the book was released in 2008). There are many commercial interests invested in persuading us to eat things that may be edible, but shouldn&rsquo;t really be considered food. These food-like products tend to rely on health claims that dumb down the science. A common example can be found in various fat-free products, where healthy fat is replaced with unhealthy amounts of sugar to compensate for the loss of flavour. These products are then marketed as healthy due to their lack of fat. The book is full of such examples, and is definitely worth reading, especially if you live in the US or in a country that&rsquo;s heavily influenced by American food culture.</p><h3 id=running-example-predicting-a-persons-health-based-on-their-diet>Running example: Predicting a person&rsquo;s health based on their diet<a hidden class=anchor aria-hidden=true href=#running-example-predicting-a-persons-health-based-on-their-diet>#</a></h3><p>Predicting health based on diet isn&rsquo;t an easy problem. First, how do you quantify and measure health? You could use proxies like longevity and occurrence/duration of disease, but these are imperfect measures because you can have a long unhealthy life (thanks to modern medicine) and some diseases are more unbearable than others. Another issue is that there are many factors other than diet that contribute to health, such as genetics, age, lifestyle, access to healthcare, etc. Finally, even if you could reliably study the effect of diet in isolation from other factors, there&rsquo;s the question of measuring the diet. Do you measure each nutrient separately or do you look at foods and consumption patterns? Do you group foods by time (e.g., looking at overall daily or monthly patterns)? If you just looked at the raw data of foods and nutrients consumed at certain points in time, every studied subject is likely to be an outlier (due to the <a href=https://en.wikipedia.org/wiki/Curse_of_dimensionality target=_blank rel=noopener>curse of dimensionality</a>). The raw data on foods consumed by individuals has to be grouped in some way to build a generalisable model, but groupings necessitate removal of some data.</p><p>Modelling real-world data is rarely straightforward. Many assumptions are embedded in the measurements and models. Good scientific papers are explicit about the shortcomings and limitations of the presented work. However, by the time scientific studies make it to the real world, shortcomings and limitations are removed to present palatable (and often wrong) conclusions to a general audience. This is illustrated nicely by the following comic:</p><figure><a href="http://www.phdcomics.com/comics.php?n=1174" target=_blank rel=noopener><img sizes="(min-width: 768px) 600px,
+<meta name=keywords content="data business,data science,health,machine learning,nutrition,nutritionism,predictive modelling"><meta name=description content="Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling"><meta property="og:description" content="Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/"><meta property="og:image" content="https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-10-19T00:02:32+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish.jpg"><meta name=twitter:title content="Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling"><meta name=twitter:description content="Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling","item":"https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling","name":"Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling","description":"Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.","keywords":["data business","data science","health","machine learning","nutrition","nutritionism","predictive modelling"],"articleBody":"I recently finished reading the book In Defense of Food: An Eater’s Manifesto by Michael Pollan. The book criticises nutritionism – the idea that one should eat according to the sum of measured nutrients while ignoring the food that contains these nutrients. The key argument of the book is that since the knowledge derived using food science is still very limited, completely relying on the partial findings and tools provided by this science is likely to lead to health issues. Instead, the author says we should “Eat food. Not too much. Mostly plants.” One of the reasons I found the book interesting is that nutritionism is a special case of misinterpretation and miscommunication of scientific results. This is something many data scientists encounter in their everyday work – finding the balance between simple and complex models, the need to “sell” models and their results to non-technical stakeholders, and the requirement for well-performing models. This post explores these issues through the example of predicting human health based on diet.\nAs an aside, I generally agree with the book’s message, which is backed by fairly thorough research (though it is a bit dated, as the book was released in 2008). There are many commercial interests invested in persuading us to eat things that may be edible, but shouldn’t really be considered food. These food-like products tend to rely on health claims that dumb down the science. A common example can be found in various fat-free products, where healthy fat is replaced with unhealthy amounts of sugar to compensate for the loss of flavour. These products are then marketed as healthy due to their lack of fat. The book is full of such examples, and is definitely worth reading, especially if you live in the US or in a country that’s heavily influenced by American food culture.\nRunning example: Predicting a person’s health based on their diet Predicting health based on diet isn’t an easy problem. First, how do you quantify and measure health? You could use proxies like longevity and occurrence/duration of disease, but these are imperfect measures because you can have a long unhealthy life (thanks to modern medicine) and some diseases are more unbearable than others. Another issue is that there are many factors other than diet that contribute to health, such as genetics, age, lifestyle, access to healthcare, etc. Finally, even if you could reliably study the effect of diet in isolation from other factors, there’s the question of measuring the diet. Do you measure each nutrient separately or do you look at foods and consumption patterns? Do you group foods by time (e.g., looking at overall daily or monthly patterns)? If you just looked at the raw data of foods and nutrients consumed at certain points in time, every studied subject is likely to be an outlier (due to the curse of dimensionality). The raw data on foods consumed by individuals has to be grouped in some way to build a generalisable model, but groupings necessitate removal of some data.\nModelling real-world data is rarely straightforward. Many assumptions are embedded in the measurements and models. Good scientific papers are explicit about the shortcomings and limitations of the presented work. However, by the time scientific studies make it to the real world, shortcomings and limitations are removed to present palatable (and often wrong) conclusions to a general audience. This is illustrated nicely by the following comic:\nSelling your model with simple explanations People like simple explanations for complex phenomena. If you work as a data scientist, or if you are planning to become/hire one, you’ve probably seen storytelling listed as one of the key skills that data scientists should have. Unlike “real” scientists that work in academia and have to explain their results mostly to peers who can handle technical complexities, data scientists in industry have to deal with non-technical stakeholders who want to understand how the models work. However, these stakeholders rarely have the time or patience to understand how things truly work. What they want is a simple hand-wavy explanation to make them feel as if they understand the matter – they want a story, not a technical report (an aside: don’t feel too smug, there is a lot of knowledge out there and in matters that fall outside of our main interests we are all non-technical stakeholders who get fed simple stories).\nOne of the simplest stories that most people can understand is the story of correlation. Going back to the running example of predicting health based on diet, it is well-known that excessive consumption of certain fats under certain conditions is correlated with an increase in likelihood of certain diseases. This is simplified in some stories to “consuming more fat increases your chance of disease”, which leads to the conclusion that consuming no fat at all decreases the chance of disease to zero. While this may sound ridiculous, it’s the sad reality. According to a recent survey, while the image of fat has improved over the past few years, 42% of Americans still try to limit or avoid all fats.\nA slightly more involved story is that of linear models – looking at the effect of the most important factors, rather than presenting a single factor’s contribution. This storytelling technique is commonly used even with non-linear models, where the most important features are identified using various techniques. The problem is that people still tend to interpret this form of presentation as a simple linear relationship. Expanding on the previous example, this approach goes from a single-minded focus on fat to the need to consume less fat and sugar, but more calcium, protein and vitamin D. Unfortunately, even linear models with tens of variables are hard for people to use and follow. In the case of nutrition, few people really track the intake of all the nutrients covered by recommended daily intakes.\nFew interesting relationships are linear Complex phenomena tend to be explained by complex non-linear models. For example, it’s not enough to consume the “right” amount of calcium – you also need vitamin D to absorb it, but popping a few vitamin D pills isn’t going to work well if you don’t consume them with fat, though over-consumption of certain fats is likely to lead to health issues. This list of human-friendly rules can go on and on, but reality is much more complex. It is naive to think that it is possible to predict something as complex as human health with a simple linear model that is based on daily nutrient intake. That being said, some relationships do lend themselves to simple rules of thumb. For example, if you don’t have enough vitamin C, you’re very likely to get scurvy, and people who don’t consume enough vitamin B1 may contract beriberi. However, when it comes to cancers and other diseases that take years to develop, linear models are inadequate.\nAn accurate model to predict human health based on diet would be based on thousands to millions of variables, and would consider many non-linear relationships. It is fairly safe to assume that there is no magic bullet that simply explains how diet affects our health, and no superfood is going to save us from the complexity of our nutritional needs. It is likely that even if we had such a model, it would not be completely accurate. All models are wrong, but some models are useful. For example, the vitamin C versus scurvy model is very useful, but it is often wrong when it comes to predicting overall health. Predictions made by useful complex models can be very hard to reason about and explain, but it doesn’t mean we shouldn’t use them.\nThe ongoing quest for sellable complex models All of the above should be pretty obvious to any modern data scientist. The culture of preferring complex models with high predictive accuracy to simplistic models with questionable predictive power is now prevalent (see Leo Breiman’s 2001 paper for a discussion of these two cultures of statistical modelling). This is illustrated by the focus of many Kaggle competitions on producing accurate models and the recent successes of deep learning for computer vision. Especially with deep learning for vision, no one expects a handful of variables (pixels) to be predictive, so traditional explanations of variable importance are useless. This does lead to a general suspicion of such models, as they are too complex for us to reason about or fully explain. However, it is very hard to argue with the empirical success of accurate modelling techniques.\nNonetheless, many data scientists still work in environments that require simple explanations. This may lead some data scientists to settle for simple models that are easier to sell. In my opinion, it is better to make up a simple explanation for an accurate complex model than settle for a simple model that doesn’t really work. That being said, some situations do call for simple or inflexible models due to a lack of data or the need to enforce strong prior assumptions. In Albert Einstein’s words, “it can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience”. Make things as simple as possible, but not simpler, and always consider the interests of people who try to sell you simplistic (or unnecessarily complex) explanations.\n","wordCount":"1569","inLanguage":"en","image":"https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish.jpg","datePublished":"2015-10-19T00:02:32Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling</h1><div class=post-meta><span title='2015-10-19 00:02:32 +0000 UTC'>October 19, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish_huc4da638d7ca6a4b8f50402f897bdb27b_209676_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish_huc4da638d7ca6a4b8f50402f897bdb27b_209676_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish_huc4da638d7ca6a4b8f50402f897bdb27b_209676_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish_huc4da638d7ca6a4b8f50402f897bdb27b_209676_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish.jpg 1275w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/health-star-dish.jpg alt width=1275 height=553></figure><div class=post-content><p>I recently finished reading the book <a href=http://michaelpollan.com/books/in-defense-of-food/ target=_blank rel=noopener>In Defense of Food: An Eater&rsquo;s Manifesto</a> by Michael Pollan. The book criticises <a href=https://en.wikipedia.org/wiki/Nutritionism target=_blank rel=noopener>nutritionism</a> – the idea that one should eat according to the sum of measured nutrients while ignoring the food that contains these nutrients. The key argument of the book is that since the knowledge derived using food science is still very limited, completely relying on the partial findings and tools provided by this science is likely to lead to health issues. Instead, the author says we should &ldquo;<em>Eat food. Not too much. Mostly plants.</em>&rdquo; One of the reasons I found the book interesting is that nutritionism is a special case of misinterpretation and miscommunication of scientific results. This is something many data scientists encounter in their everyday work – finding the balance between simple and complex models, the need to &ldquo;sell&rdquo; models and their results to non-technical stakeholders, and the requirement for well-performing models. This post explores these issues through the example of predicting human health based on diet.</p><p>As an aside, I generally agree with the book&rsquo;s message, which is backed by fairly thorough research (though it is a bit dated, as the book was released in 2008). There are many commercial interests invested in persuading us to eat things that may be edible, but shouldn&rsquo;t really be considered food. These food-like products tend to rely on health claims that dumb down the science. A common example can be found in various fat-free products, where healthy fat is replaced with unhealthy amounts of sugar to compensate for the loss of flavour. These products are then marketed as healthy due to their lack of fat. The book is full of such examples, and is definitely worth reading, especially if you live in the US or in a country that&rsquo;s heavily influenced by American food culture.</p><h3 id=running-example-predicting-a-persons-health-based-on-their-diet>Running example: Predicting a person&rsquo;s health based on their diet<a hidden class=anchor aria-hidden=true href=#running-example-predicting-a-persons-health-based-on-their-diet>#</a></h3><p>Predicting health based on diet isn&rsquo;t an easy problem. First, how do you quantify and measure health? You could use proxies like longevity and occurrence/duration of disease, but these are imperfect measures because you can have a long unhealthy life (thanks to modern medicine) and some diseases are more unbearable than others. Another issue is that there are many factors other than diet that contribute to health, such as genetics, age, lifestyle, access to healthcare, etc. Finally, even if you could reliably study the effect of diet in isolation from other factors, there&rsquo;s the question of measuring the diet. Do you measure each nutrient separately or do you look at foods and consumption patterns? Do you group foods by time (e.g., looking at overall daily or monthly patterns)? If you just looked at the raw data of foods and nutrients consumed at certain points in time, every studied subject is likely to be an outlier (due to the <a href=https://en.wikipedia.org/wiki/Curse_of_dimensionality target=_blank rel=noopener>curse of dimensionality</a>). The raw data on foods consumed by individuals has to be grouped in some way to build a generalisable model, but groupings necessitate removal of some data.</p><p>Modelling real-world data is rarely straightforward. Many assumptions are embedded in the measurements and models. Good scientific papers are explicit about the shortcomings and limitations of the presented work. However, by the time scientific studies make it to the real world, shortcomings and limitations are removed to present palatable (and often wrong) conclusions to a general audience. This is illustrated nicely by the following comic:</p><figure><a href="http://www.phdcomics.com/comics.php?n=1174" target=_blank rel=noopener><img sizes="(min-width: 768px) 600px,
 100vw" srcset="https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/phd-comics-science-news-cycle_hub589cb87d006926e3ce7b389284e329c_105126_360x0_resize_box_1.gif 360w,
 https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/phd-comics-science-news-cycle_hub589cb87d006926e3ce7b389284e329c_105126_480x0_resize_box_1.gif 480w,
 https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/phd-comics-science-news-cycle.gif 600w," src=https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/phd-comics-science-news-cycle.gif alt="PHD Comics: Science News Cycle" loading=lazy></a></figure><h3 id=selling-your-model-with-simple-explanations>Selling your model with simple explanations<a hidden class=anchor aria-hidden=true href=#selling-your-model-with-simple-explanations>#</a></h3><p>People like simple explanations for complex phenomena. If you work as a data scientist, or if you are planning to become/hire one, you&rsquo;ve probably seen <strong>storytelling</strong> listed as one of the key skills that data scientists should have. Unlike &ldquo;real&rdquo; scientists that work in academia and have to explain their results mostly to peers who can handle technical complexities, data scientists in industry have to deal with non-technical stakeholders who want to understand how the models work. However, these stakeholders rarely have the time or patience to understand how things truly work. What they want is a simple hand-wavy explanation to make them <em>feel</em> as if they understand the matter – they want a <em>story</em>, not a technical report (an aside: don&rsquo;t feel too smug, there is a lot of knowledge out there and in matters that fall outside of our main interests we are all non-technical stakeholders who get fed simple stories).</p><p>One of the simplest stories that most people can understand is the story of <strong>correlation</strong>. Going back to the running example of predicting health based on diet, it is well-known that excessive consumption of certain fats under certain conditions is correlated with an increase in likelihood of certain diseases. This is simplified in some stories to &ldquo;consuming more fat increases your chance of disease&rdquo;, which leads to the conclusion that consuming no fat at all decreases the chance of disease to zero. While this may sound ridiculous, it&rsquo;s the sad reality. According to <a href=http://www.foodinsight.org/2015-food-health-survey-consumer-research target=_blank rel=noopener>a recent survey</a>, while the image of fat has improved over the past few years, 42% of Americans still try to limit or avoid all fats.</p><p>A slightly more involved story is that of <strong>linear models</strong> – looking at the effect of the most important factors, rather than presenting a single factor&rsquo;s contribution. This storytelling technique is commonly used even with non-linear models, where the most important features are identified using various techniques. The problem is that people still tend to interpret this form of presentation as a simple linear relationship. Expanding on the previous example, this approach goes from a single-minded focus on fat to the need to consume less fat and sugar, but more calcium, protein and vitamin D. Unfortunately, even linear models with tens of variables are hard for people to use and follow. In the case of nutrition, few people really track the intake of all the nutrients covered by recommended daily intakes.</p><h3 id=few-interesting-relationships-are-linear>Few interesting relationships are linear<a hidden class=anchor aria-hidden=true href=#few-interesting-relationships-are-linear>#</a></h3><p>Complex phenomena tend to be explained by complex non-linear models. For example, it&rsquo;s not enough to consume the &ldquo;right&rdquo; amount of calcium – <a href=https://en.wikipedia.org/wiki/Calcium#Nutrition target=_blank rel=noopener>you also need vitamin D to absorb it</a>, but <a href=http://www.medicaldaily.com/vitamin-d-benefits-are-enhanced-if-meal-contains-fat-absorbing-more-supplements-311248 target=_blank rel=noopener>popping a few vitamin D pills isn&rsquo;t going to work well if you don&rsquo;t consume them with fat</a>, though <a href=https://en.wikipedia.org/wiki/Trans_fat#Health_risks target=_blank rel=noopener>over-consumption of certain fats is likely to lead to health issues</a>. This list of human-friendly rules can go on and on, but reality is much more complex. It is naive to think that it is possible to predict something as complex as human health with a simple linear model that is based on daily nutrient intake. That being said, some relationships do lend themselves to simple rules of thumb. For example, if you don&rsquo;t have enough vitamin C, you&rsquo;re very likely to get <a href=https://en.wikipedia.org/wiki/Scurvy target=_blank rel=noopener>scurvy</a>, and people who don&rsquo;t consume enough vitamin B1 may contract <a href=https://en.wikipedia.org/wiki/Beriberi target=_blank rel=noopener>beriberi</a>. However, when it comes to cancers and other diseases that take years to develop, linear models are inadequate.</p><p>An accurate model to predict human health based on diet would be based on thousands to millions of variables, and would consider many non-linear relationships. It is fairly safe to assume that there is no magic bullet that simply explains how diet affects our health, and no <a href=https://en.wikipedia.org/wiki/Superfood target=_blank rel=noopener>superfood</a> is going to save us from the complexity of our nutritional needs. It is likely that even if we had such a model, it would not be completely accurate. All models are wrong, but some models are useful. For example, the vitamin C versus scurvy model is very useful, but it is often wrong when it comes to predicting overall health. Predictions made by useful complex models can be very hard to reason about and explain, but it doesn&rsquo;t mean we shouldn&rsquo;t use them.</p><h3 id=the-ongoing-quest-for-sellable-complex-models>The ongoing quest for sellable complex models<a hidden class=anchor aria-hidden=true href=#the-ongoing-quest-for-sellable-complex-models>#</a></h3><p>All of the above should be pretty obvious to any modern data scientist. The culture of preferring complex models with high predictive accuracy to simplistic models with questionable predictive power is now prevalent (see <a href=http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726 target=_blank rel=noopener>Leo Breiman&rsquo;s 2001 paper for a discussion of these two cultures of statistical modelling</a>). This is illustrated by the focus of many <a href=https://www.kaggle.com/ target=_blank rel=noopener>Kaggle</a> competitions on producing accurate models and the recent successes of <a href=https://en.wikipedia.org/wiki/Deep_learning#Image_recognition target=_blank rel=noopener>deep learning for computer vision</a>. Especially with deep learning for vision, no one expects a handful of variables (pixels) to be predictive, so traditional explanations of variable importance are useless. This does lead to a general suspicion of such models, as they are too complex for us to reason about or fully explain. However, it is very hard to argue with the empirical success of accurate modelling techniques.</p><p>Nonetheless, many data scientists still work in environments that require simple explanations. This may lead some data scientists to settle for simple models that are easier to sell. In my opinion, it is better to make up a simple explanation for an accurate complex model than settle for a simple model that doesn&rsquo;t really work. That being said, some situations do call for simple or inflexible models due to a lack of data or the need to enforce strong prior assumptions. In Albert Einstein&rsquo;s words, &ldquo;it can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience&rdquo;. Make things as simple as possible, but not simpler, and always consider the interests of people who try to sell you simplistic (or unnecessarily complex) explanations.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-business/>Data Business</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/health/>Health</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/nutrition/>Nutrition</a></li><li><a href=https://yanirseroussi.com/tags/nutritionism/>Nutritionism</a></li><li><a href=https://yanirseroussi.com/tags/predictive-modelling/>Predictive Modelling</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling on x" href="https://x.com/intent/tweet/?text=Miscommunicating%20science%3a%20Simplistic%20models%2c%20nutritionism%2c%20and%20the%20art%20of%20storytelling&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f10%2f19%2fnutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena%2f&amp;hashtags=databusiness%2cdatascience%2chealth%2cmachinelearning%2cnutrition%2cnutritionism%2cpredictivemodelling"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f10%2f19%2fnutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena%2f&amp;title=Miscommunicating%20science%3a%20Simplistic%20models%2c%20nutritionism%2c%20and%20the%20art%20of%20storytelling&amp;summary=Miscommunicating%20science%3a%20Simplistic%20models%2c%20nutritionism%2c%20and%20the%20art%20of%20storytelling&amp;source=https%3a%2f%2fyanirseroussi.com%2f2015%2f10%2f19%2fnutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2015%2f10%2f19%2fnutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena%2f&title=Miscommunicating%20science%3a%20Simplistic%20models%2c%20nutritionism%2c%20and%20the%20art%20of%20storytelling"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2015%2f10%2f19%2fnutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling on whatsapp" href="https://api.whatsapp.com/send?text=Miscommunicating%20science%3a%20Simplistic%20models%2c%20nutritionism%2c%20and%20the%20art%20of%20storytelling%20-%20https%3a%2f%2fyanirseroussi.com%2f2015%2f10%2f19%2fnutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling on telegram" href="https://telegram.me/share/url?text=Miscommunicating%20science%3a%20Simplistic%20models%2c%20nutritionism%2c%20and%20the%20art%20of%20storytelling&amp;url=https%3a%2f%2fyanirseroussi.com%2f2015%2f10%2f19%2fnutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling on ycombinator" href="https://news.ycombinator.com/submitlink?t=Miscommunicating%20science%3a%20Simplistic%20models%2c%20nutritionism%2c%20and%20the%20art%20of%20storytelling&u=https%3a%2f%2fyanirseroussi.com%2f2015%2f10%2f19%2fnutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
diff --git a/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/index.html b/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/index.html
index 9329cf858..dcbcb0b0f 100644
--- a/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/index.html
+++ b/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Migrating a simple web application from MongoDB to Elasticsearch | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="BCRecommender,DevOps,Elasticsearch,MongoDB,software engineering"><meta name=description content="Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Migrating a simple web application from MongoDB to Elasticsearch"><meta property="og:description" content="Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/"><meta property="og:image" content="https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/mongodb-to-elasticsearch.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-11-04T03:53:18+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/mongodb-to-elasticsearch.png"><meta name=twitter:title content="Migrating a simple web application from MongoDB to Elasticsearch"><meta name=twitter:description content="Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Migrating a simple web application from MongoDB to Elasticsearch","item":"https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Migrating a simple web application from MongoDB to Elasticsearch","name":"Migrating a simple web application from MongoDB to Elasticsearch","description":"Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.","keywords":["BCRecommender","DevOps","Elasticsearch","MongoDB","software engineering"],"articleBody":"Bandcamp Recommender (BCRecommender) is a web application that serves music recommendations from Bandcamp. I recently switched BCRecommender’s data store from MongoDB to Elasticsearch. This has made it possible to offer a richer search experience to users at a similar cost. This post describes the migration process and discusses some of the advantages and disadvantages of using Elasticsearch instead of MongoDB.\nMotivation: Why swap MongoDB for Elasticsearch? I’ve written a few posts in the past on BCRecommender’s design and implementation. It is a fairly simple application with two main components: the backend worker that crawls data and generates recommendations in batch, and the webapp that serves the recommendations. Importantly, each of these components has its own data store, with the recommendations synced up from the worker to the webapp, and data like events and subscriptions synced down from the webapp to the worker. Recently, I migrated the webapp component from Parse to DigitalOcean, replacing Parse’s data store with MongoDB. Choosing MongoDB was meant to simplify the transition – Parse uses MongoDB behind the scenes, as does the backend worker. However, moving out of Parse’s sandboxed environment freed me to choose any data store, and Elasticsearch seemed like a good candidate that would make it possible to expose advanced search capabilities to end users.\nAdvanced search means different things to different people. In BCRecommender’s case what I had in mind was rather modest, at least for the initial stages. BCRecommender presents recommendations for two types of entities: fans and tralbums (tracks/albums). In both cases, the recommended items are tralbums. When the key is a fan, the recommendations are tralbums that they may like, and when the key is a tralbum, the recommendations are similar tralbums. Each tralbum has a title, an artist name, and a list of tags. Each fan has its Bandcamp username as a primary key, and a list of tags that is derived from the tralbums in the fan’s collection. Originally, “searching” required users to either enter the exact username of a Bandcamp fan, or the exact Bandcamp link of a tralbum – not the best user experience! Indeed, I was tracking the search terms and found that many people were unsuccessfully trying to use unstructured queries. My idea of advanced search was to move away from the original key-value approach to full-text search that considers tags, titles, artists, and other fields that may get added later.\nIt was clear that while it may be possible to provide advanced search with MongoDB, it wouldn’t be a smooth ride. While recent versions of MongoDB include support for full-text search, it isn’t as feature-rich as Elasticsearch. For example, MongoDB text indices do not store phrases or information about the proximity of words in the documents, making phrase queries run slowly unless the entire collection fits in memory. The names really say it all: MongoDB is a database with some search capabilities, and Elasticsearch is a search engine with some database capabilities. It seems pretty common to use MongoDB (or another database) as a data store and supply search through Elasticsearch, so I figured it isn’t a bad idea to apply this pattern to BCRecommender.\nIt is worth noting that if BCRecommender were a for-profit project, I would probably use Algolia rather than Elasticsearch. My experience with Algolia on a different project has been excellent – they make it easy for you to get started, have great customer service, and deliver good and fast results with minimal development and operational effort. The two main disadvantages of Algolia are its price and the fact that it’s a closed-source solution (see further discussion on Quora). At over two million records, the monthly cost of running Algolia for BCRecommender would be around US$649, which is more than what I’m willing to spend on this project. However, for a business this may be a reasonable cost because deploying and maintaining an Elasticsearch cluster may end up costing more. Nonetheless, many businesses use Elasticsearch successfully, which is why I have no doubt that it’s a great choice for my use case – it just requires more work than Algolia to get up and running.\nExecuting the migration plan The plan for migrating the webapp from MongoDB to Elasticsearch was pretty simple:\nRead the Elasticsearch manual to ensure it suits my needs Replace MongoDB with Elasticsearch without making any user-facing changes Expose full-text search to BCRecommender users Improve search performance based on user behaviour Implement more search features Reading the manual is not something I do for every piece of technology I use (there are just too many tools out there these days), but for Elasticsearch it seemed to be worth the effort. I’m not done reading yet, but covering the material in the Getting Started and Search in Depth sections gave me enough information to complete steps 2 \u0026 3. The main things I was worried about was Elasticsearch’s performance as a database and how memory-hungry it’d be. Reading the manual allowed me to avoid some memory-use pitfalls and gave me insights on the way MongoDB and Elasticsearch compare (see details below).\nSwitching from MongoDB to Elasticsearch as a simple database was pretty straightforward. Both are document-based, so there were no changes required to the data models, but I did use the opportunity to fix some issues. For example, I changed the sitemap generation process from dynamic to static to avoid having to scroll through the entire dataset to fetch deep sitemap pages. To support BCRecommender’s feature of browsing through random fans, I replaced MongoDB’s somewhat-hacky approach of returning random results with Elasticsearch’s cleaner method. As the webapp is implemented in Python, I originally used the elasticsearch-dsl package, but found it too hard to debug queries (e.g., figuring out how to rank results randomly was a bit of a nightmare). Instead, I ended up using the elasticsearch-py package, which is only a thin wrapper around the Elasticsearch API. This approach yields code that doesn’t look very Pythonic – rather than following the Zen of Python’s flat is better than nested aphorism, the API follows the more Java-esque belief of you can never have enough nesting (see image below for example). However, I prefer overly-nested structures that I can debug to flat code that doesn’t work. I may try using the DSL again in the future, once I’ve gained more experience with Elasticsearch.\nAs mentioned, one of my worries was that I would have to increase the amount of memory allocated to the machine where Elasticsearch runs. Since BCRecommender is a fairly low-budget project, I’m willing to sacrifice high availability to save a bit on operational costs. Therefore, the webapp and its data store run on the same DigitalOcean instance, which is enough to happily serve the current amount of traffic (around one request per second). By default, Elasticsearch indexes all the fields, and even includes an extra indexed _all field that is a concatenation of all string fields in a document. While indexing everything may be convenient, it wasn’t necessary for the first stage. Choosing the minimal index settings allowed me to keep using the same instance size as before (1GB RAM and 30GB SSD). In fact, due to the switch to static sitemaps and the removal of MongoDB’s random attribute hack, fewer indexes were required after the change.\nOnce I had all the code converted and working on my local Vagrant environment, it was time to deploy. The deployment was fairly straightforward and required no downtime, as I simply provisioned a new instance and switched over the floating IP once it was all tested and ready to go. I monitored response time and memory use closely and everything seemed to be working just fine – similarly to MongoDB. After a week of monitoring, it was time to take the next step and enable advanced search.\nEnabling full-text search is where things got interesting. This phase required adding a search result page (previously users were redirected to the queried page if it was found), and reindexing the data. For this phase, I tried to keep things as simple as possible, and just indexed the string fields (tags, artist, and title) using the standard analyser. I did some manual testing of search results based on common queries, and played a bit with improving precision and recall. Perhaps the most important tweak was allowing an item’s activity level to influence the ranking. For each tralbum, the activity level is the number of fans that have the tralbum in their collection, and for each fan, it is the size of the collection. For example, when searching for amanda, the top result is the fan with username amanda, followed by tralbums by the popular Amanda Palmer. Before I added the consideration of activity level, all tralbums and fans that contained the word amanda had the same ranking.\nI deployed full-text search earlier this week, and so far it’s looking pretty good. Elasticsearch seems to be coping well with having the same level of resources allocated as before, but it’s still too early to tell if this is sustainable over time. Most importantly, users are finally seeing results when they enter unstructured queries, which increases their engagement and retention. Woohoo!\nImproving search performance based on user behaviour is expected to be an ongoing effort. Despite having many ideas, I resisted the temptation of endless offline tinkering and opted to release a working search page quickly. With Google Analytics now set up to track site search, the plan is keep identifying gaps and tweak the search settings continuously. This will take a while, as the number of daily users is currently 200-300, and they don’t all use site search.\nImplementing more search features is another set of items on my to-do list that will be addressed over time. For example, it’d be great to have search auto-completion and a prettier result page. However, I have more ideas than time to implement them, and I’m not working on BCRecommender full-time. For now, I’m pretty happy with finally having the search function.\nElasticsearch versus MongoDB: Key findings Comparisons between tools should always be taken with a grain of salt. General comparisons may not address features that are important for your specific use case, or may overemphasise aspects that you don’t care about. In addition, actively developed tools are moving targets. Since I started the transition to Elasticsearch, version 2.0 has been released, and MongoDB 3.2 is expected very soon. The following list is derived from my experience and may not apply to you. You have been warned!\nWith the disclaimer out of the way, here are some of the advantages of Elasticsearch over MongoDB:\nBetter full-text search support (duh!). Enforceable schemas and type validation (note: some form of optional schema is expected in MongoDB 3.2). All fields are indexed by default, making it easy to explore unstructured data without worrying about adding indices. It appears that indexing is implemented in a more efficient way that doesn’t block the node. Slowness due to indexing operations seems to be a common issue with MongoDB, even with background index creation. It’s possible to query multiple indices and types (same as MongoDB databases \u0026 collections, respectively) in the same query. This is a huge advantage in my case as it makes it possible to efficiently search both fans and tralbums in a single query. Index aliases make it easy to change the indices without changing the application. Multi-get by IDs returns results in the order they were requested. This is not the case with MongoDB, where using $in doesn’t have any guarantees on the returned documents’ order. It’s easy to work around this issue, but it can be the source of subtle bugs. In my case, recommendations were unintentionally sorted in random order until I added an additional step to sort them correctly. Built-in support for random scoring (note: random sampling will finally be available in MondoDB 3.2 – the ticket for this has been open for 5 years). Built-in support for multiple types of analysis on the same field. Some disadvantages of Elasticsearch in comparison to MongoDB are:\nAll fields are indexed by default, making it easy to run into memory issues. Adjusting these default settings is strongly recommended if you know how you’re going to query the data. Documents are immutable, so every update requires deleting the original document and re-inserting it (in practice, it seems like this isn’t much of an issue). Sorting results by a field requires reading all the field’s values and sorting them in memory. The sorted results are cached, but this may cause issues if memory is too limited. In conclusion, my experience with Elasticsearch has been mostly positive so far and I’m glad I’ve made the switch. I’m looking forward to taking further advantage of advanced search features to improve user experience on BCRecommender. New posts on the topic may be published in the future, so please subscribe to be notified when this happens. As always, I’m happy to receive feedback through the comments or privately.\n","wordCount":"2165","inLanguage":"en","image":"https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/mongodb-to-elasticsearch.png","datePublished":"2015-11-04T03:53:18Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Migrating a simple web application from MongoDB to Elasticsearch</h1><div class=post-meta><span title='2015-11-04 03:53:18 +0000 UTC'>November 4, 2015</span></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/mongodb-to-elasticsearch.png alt></figure><div class=post-content><p><a href=http://www.bcrecommender.com target=_blank rel=noopener>Bandcamp Recommender (BCRecommender)</a> is a web application that serves music recommendations from <a href=http://bandcamp.com target=_blank rel=noopener>Bandcamp</a>. I recently switched BCRecommender&rsquo;s data store from <a href=https://www.mongodb.com/ target=_blank rel=noopener>MongoDB</a> to <a href=https://www.elastic.co/products/elasticsearch target=_blank rel=noopener>Elasticsearch</a>. This has made it possible to offer a richer search experience to users at a similar cost. This post describes the migration process and discusses some of the advantages and disadvantages of using Elasticsearch instead of MongoDB.</p><h2 id=motivation-why-swap-mongodb-for-elasticsearch>Motivation: Why swap MongoDB for Elasticsearch?<a hidden class=anchor aria-hidden=true href=#motivation-why-swap-mongodb-for-elasticsearch>#</a></h2><p>I&rsquo;ve written a few posts in the past on BCRecommender&rsquo;s design and implementation. It is a fairly <a href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/>simple application with two main components</a>: the backend worker that crawls data and generates recommendations in batch, and the webapp that serves the recommendations. Importantly, each of these components has its own data store, with the recommendations synced up from the worker to the webapp, and data like events and subscriptions synced down from the webapp to the worker. Recently, I <a href=https://yanirseroussi.com/2015/07/31/goodbye-parse-com/>migrated the webapp component from Parse to DigitalOcean</a>, replacing Parse&rsquo;s data store with MongoDB. Choosing MongoDB was meant to simplify the transition – Parse uses MongoDB behind the scenes, as does the backend worker. However, moving out of Parse&rsquo;s sandboxed environment freed me to choose any data store, and Elasticsearch seemed like a good candidate that would make it possible to expose advanced search capabilities to end users.</p><p>Advanced search means different things to different people. In BCRecommender&rsquo;s case what I had in mind was rather modest, at least for the initial stages. BCRecommender presents recommendations for two types of entities: fans and tralbums (tracks/albums). In both cases, the recommended items are tralbums. <a href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/>When the key is a fan, the recommendations are tralbums that they may like, and when the key is a tralbum, the recommendations are similar tralbums</a>. Each tralbum has a title, an artist name, and a list of tags. Each fan has its Bandcamp username as a primary key, and a list of tags that is derived from the tralbums in the fan&rsquo;s collection. Originally, &ldquo;searching&rdquo; required users to either enter the exact username of a Bandcamp fan, or the exact Bandcamp link of a tralbum – not the best user experience! Indeed, I was tracking the search terms and found that many people were unsuccessfully trying to use unstructured queries. My idea of advanced search was to move away from the original key-value approach to full-text search that considers tags, titles, artists, and other fields that may get added later.</p><p>It was clear that while it may be possible to provide advanced search with MongoDB, it <a href=http://beletsky.net/2014/05/got-tired-of-mongodb-full-text.html target=_blank rel=noopener>wouldn&rsquo;t be a smooth ride</a>. While recent versions of MongoDB include support for full-text search, it isn&rsquo;t as feature-rich as Elasticsearch. For example, <a href=https://docs.mongodb.org/manual/core/index-text/#storage-requirements-and-performance-costs target=_blank rel=noopener>MongoDB text indices do not store phrases or information about the proximity of words in the documents</a>, making phrase queries run slowly unless the entire collection fits in memory. The names really say it all: MongoDB is a database with some search capabilities, and Elasticsearch is a search engine with some database capabilities. It <a href=https://www.compose.io/articles/mongoosastic-the-power-of-mongodb-and-elasticsearch-together/ target=_blank rel=noopener>seems pretty common</a> to use MongoDB (or another database) as a data store and supply search through Elasticsearch, so I figured it isn&rsquo;t a bad idea to apply this pattern to BCRecommender.</p><p>It is worth noting that if BCRecommender were a for-profit project, I would probably use <a href=https://www.algolia.com target=_blank rel=noopener>Algolia</a> rather than Elasticsearch. My experience with Algolia on a different project has been excellent – they make it easy for you to get started, have great customer service, and deliver good and fast results with minimal development and operational effort. The two main disadvantages of Algolia are its price and the fact that it&rsquo;s a closed-source solution (<a href=https://www.quora.com/How-does-Elasticsearch-relate-and-or-compare-to-Algolias-Search-as-a-Service target=_blank rel=noopener>see further discussion on Quora</a>). At over two million records, the monthly cost of running Algolia for BCRecommender would be around US$649, which is more than what I&rsquo;m willing to spend on this project. However, for a business this may be a reasonable cost because deploying and maintaining an Elasticsearch cluster may end up costing more. Nonetheless, many businesses use Elasticsearch successfully, which is why I have no doubt that it&rsquo;s a great choice for my use case – it just requires more work than Algolia to get up and running.</p><h2 id=executing-the-migration-plan>Executing the migration plan<a hidden class=anchor aria-hidden=true href=#executing-the-migration-plan>#</a></h2><p>The plan for migrating the webapp from MongoDB to Elasticsearch was pretty simple:</p><ol><li>Read the <a href=https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html target=_blank rel=noopener>Elasticsearch manual</a> to ensure it suits my needs</li><li>Replace MongoDB with Elasticsearch without making any user-facing changes</li><li>Expose full-text search to BCRecommender users</li><li>Improve search performance based on user behaviour</li><li>Implement more search features</li></ol><p><strong>Reading the manual</strong> is not something I do for every piece of technology I use (there are just too many tools out there these days), but for Elasticsearch it seemed to be worth the effort. I&rsquo;m not done reading yet, but covering the material in the <em>Getting Started</em> and <em>Search in Depth</em> sections gave me enough information to complete steps 2 & 3. The main things I was worried about was Elasticsearch&rsquo;s performance as a database and how memory-hungry it&rsquo;d be. Reading the manual allowed me to avoid some memory-use pitfalls and gave me insights on the way MongoDB and Elasticsearch compare (<a href=#es-vs-mongo>see details below</a>).</p><p><strong>Switching from MongoDB to Elasticsearch as a simple database</strong> was pretty straightforward. Both are document-based, so there were no changes required to the data models, but I did use the opportunity to fix some issues. For example, I changed the sitemap generation process from dynamic to static to avoid having to scroll through the entire dataset to fetch deep sitemap pages. To support BCRecommender&rsquo;s feature of browsing through random fans, I replaced <a href=http://bdadam.com/blog/finding-a-random-document-in-mongodb.html target=_blank rel=noopener>MongoDB&rsquo;s somewhat-hacky approach of returning random results</a> with <a href=https://www.elastic.co/guide/en/elasticsearch/guide/current/random-scoring.html target=_blank rel=noopener>Elasticsearch&rsquo;s cleaner method</a>. As the webapp is implemented in Python, I originally used the <a href=http://elasticsearch-dsl.readthedocs.org/en/latest/ target=_blank rel=noopener>elasticsearch-dsl package</a>, but found it too hard to debug queries (e.g., figuring out how to rank results randomly was a bit of a nightmare). Instead, I ended up using the <a href=http://elasticsearch-py.readthedocs.org/en/latest/ target=_blank rel=noopener>elasticsearch-py package</a>, which is only a thin wrapper around the Elasticsearch API. This approach yields code that doesn&rsquo;t look very Pythonic – rather than following the <a href=https://www.python.org/dev/peps/pep-0020/ target=_blank rel=noopener>Zen of Python&rsquo;s</a> <em>flat is better than nested</em> aphorism, the API follows the more Java-esque belief of <em>you can never have enough nesting</em> (see image below for example). However, I prefer overly-nested structures that I can debug to flat code that doesn&rsquo;t work. I may try using the DSL again in the future, once I&rsquo;ve gained more experience with Elasticsearch.</p><figure><a href=elasticsearch-is-nesty.png target=_blank rel=noopener><img sizes="(min-width: 768px) 528px,
+<meta name=keywords content="BCRecommender,DevOps,Elasticsearch,MongoDB,software engineering"><meta name=description content="Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Migrating a simple web application from MongoDB to Elasticsearch"><meta property="og:description" content="Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/"><meta property="og:image" content="https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/mongodb-to-elasticsearch.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-11-04T03:53:18+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/mongodb-to-elasticsearch.png"><meta name=twitter:title content="Migrating a simple web application from MongoDB to Elasticsearch"><meta name=twitter:description content="Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Migrating a simple web application from MongoDB to Elasticsearch","item":"https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Migrating a simple web application from MongoDB to Elasticsearch","name":"Migrating a simple web application from MongoDB to Elasticsearch","description":"Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.","keywords":["BCRecommender","DevOps","Elasticsearch","MongoDB","software engineering"],"articleBody":"Bandcamp Recommender (BCRecommender) is a web application that serves music recommendations from Bandcamp. I recently switched BCRecommender’s data store from MongoDB to Elasticsearch. This has made it possible to offer a richer search experience to users at a similar cost. This post describes the migration process and discusses some of the advantages and disadvantages of using Elasticsearch instead of MongoDB.\nMotivation: Why swap MongoDB for Elasticsearch? I’ve written a few posts in the past on BCRecommender’s design and implementation. It is a fairly simple application with two main components: the backend worker that crawls data and generates recommendations in batch, and the webapp that serves the recommendations. Importantly, each of these components has its own data store, with the recommendations synced up from the worker to the webapp, and data like events and subscriptions synced down from the webapp to the worker. Recently, I migrated the webapp component from Parse to DigitalOcean, replacing Parse’s data store with MongoDB. Choosing MongoDB was meant to simplify the transition – Parse uses MongoDB behind the scenes, as does the backend worker. However, moving out of Parse’s sandboxed environment freed me to choose any data store, and Elasticsearch seemed like a good candidate that would make it possible to expose advanced search capabilities to end users.\nAdvanced search means different things to different people. In BCRecommender’s case what I had in mind was rather modest, at least for the initial stages. BCRecommender presents recommendations for two types of entities: fans and tralbums (tracks/albums). In both cases, the recommended items are tralbums. When the key is a fan, the recommendations are tralbums that they may like, and when the key is a tralbum, the recommendations are similar tralbums. Each tralbum has a title, an artist name, and a list of tags. Each fan has its Bandcamp username as a primary key, and a list of tags that is derived from the tralbums in the fan’s collection. Originally, “searching” required users to either enter the exact username of a Bandcamp fan, or the exact Bandcamp link of a tralbum – not the best user experience! Indeed, I was tracking the search terms and found that many people were unsuccessfully trying to use unstructured queries. My idea of advanced search was to move away from the original key-value approach to full-text search that considers tags, titles, artists, and other fields that may get added later.\nIt was clear that while it may be possible to provide advanced search with MongoDB, it wouldn’t be a smooth ride. While recent versions of MongoDB include support for full-text search, it isn’t as feature-rich as Elasticsearch. For example, MongoDB text indices do not store phrases or information about the proximity of words in the documents, making phrase queries run slowly unless the entire collection fits in memory. The names really say it all: MongoDB is a database with some search capabilities, and Elasticsearch is a search engine with some database capabilities. It seems pretty common to use MongoDB (or another database) as a data store and supply search through Elasticsearch, so I figured it isn’t a bad idea to apply this pattern to BCRecommender.\nIt is worth noting that if BCRecommender were a for-profit project, I would probably use Algolia rather than Elasticsearch. My experience with Algolia on a different project has been excellent – they make it easy for you to get started, have great customer service, and deliver good and fast results with minimal development and operational effort. The two main disadvantages of Algolia are its price and the fact that it’s a closed-source solution (see further discussion on Quora). At over two million records, the monthly cost of running Algolia for BCRecommender would be around US$649, which is more than what I’m willing to spend on this project. However, for a business this may be a reasonable cost because deploying and maintaining an Elasticsearch cluster may end up costing more. Nonetheless, many businesses use Elasticsearch successfully, which is why I have no doubt that it’s a great choice for my use case – it just requires more work than Algolia to get up and running.\nExecuting the migration plan The plan for migrating the webapp from MongoDB to Elasticsearch was pretty simple:\nRead the Elasticsearch manual to ensure it suits my needs Replace MongoDB with Elasticsearch without making any user-facing changes Expose full-text search to BCRecommender users Improve search performance based on user behaviour Implement more search features Reading the manual is not something I do for every piece of technology I use (there are just too many tools out there these days), but for Elasticsearch it seemed to be worth the effort. I’m not done reading yet, but covering the material in the Getting Started and Search in Depth sections gave me enough information to complete steps 2 \u0026 3. The main things I was worried about was Elasticsearch’s performance as a database and how memory-hungry it’d be. Reading the manual allowed me to avoid some memory-use pitfalls and gave me insights on the way MongoDB and Elasticsearch compare (see details below).\nSwitching from MongoDB to Elasticsearch as a simple database was pretty straightforward. Both are document-based, so there were no changes required to the data models, but I did use the opportunity to fix some issues. For example, I changed the sitemap generation process from dynamic to static to avoid having to scroll through the entire dataset to fetch deep sitemap pages. To support BCRecommender’s feature of browsing through random fans, I replaced MongoDB’s somewhat-hacky approach of returning random results with Elasticsearch’s cleaner method. As the webapp is implemented in Python, I originally used the elasticsearch-dsl package, but found it too hard to debug queries (e.g., figuring out how to rank results randomly was a bit of a nightmare). Instead, I ended up using the elasticsearch-py package, which is only a thin wrapper around the Elasticsearch API. This approach yields code that doesn’t look very Pythonic – rather than following the Zen of Python’s flat is better than nested aphorism, the API follows the more Java-esque belief of you can never have enough nesting (see image below for example). However, I prefer overly-nested structures that I can debug to flat code that doesn’t work. I may try using the DSL again in the future, once I’ve gained more experience with Elasticsearch.\nAs mentioned, one of my worries was that I would have to increase the amount of memory allocated to the machine where Elasticsearch runs. Since BCRecommender is a fairly low-budget project, I’m willing to sacrifice high availability to save a bit on operational costs. Therefore, the webapp and its data store run on the same DigitalOcean instance, which is enough to happily serve the current amount of traffic (around one request per second). By default, Elasticsearch indexes all the fields, and even includes an extra indexed _all field that is a concatenation of all string fields in a document. While indexing everything may be convenient, it wasn’t necessary for the first stage. Choosing the minimal index settings allowed me to keep using the same instance size as before (1GB RAM and 30GB SSD). In fact, due to the switch to static sitemaps and the removal of MongoDB’s random attribute hack, fewer indexes were required after the change.\nOnce I had all the code converted and working on my local Vagrant environment, it was time to deploy. The deployment was fairly straightforward and required no downtime, as I simply provisioned a new instance and switched over the floating IP once it was all tested and ready to go. I monitored response time and memory use closely and everything seemed to be working just fine – similarly to MongoDB. After a week of monitoring, it was time to take the next step and enable advanced search.\nEnabling full-text search is where things got interesting. This phase required adding a search result page (previously users were redirected to the queried page if it was found), and reindexing the data. For this phase, I tried to keep things as simple as possible, and just indexed the string fields (tags, artist, and title) using the standard analyser. I did some manual testing of search results based on common queries, and played a bit with improving precision and recall. Perhaps the most important tweak was allowing an item’s activity level to influence the ranking. For each tralbum, the activity level is the number of fans that have the tralbum in their collection, and for each fan, it is the size of the collection. For example, when searching for amanda, the top result is the fan with username amanda, followed by tralbums by the popular Amanda Palmer. Before I added the consideration of activity level, all tralbums and fans that contained the word amanda had the same ranking.\nI deployed full-text search earlier this week, and so far it’s looking pretty good. Elasticsearch seems to be coping well with having the same level of resources allocated as before, but it’s still too early to tell if this is sustainable over time. Most importantly, users are finally seeing results when they enter unstructured queries, which increases their engagement and retention. Woohoo!\nImproving search performance based on user behaviour is expected to be an ongoing effort. Despite having many ideas, I resisted the temptation of endless offline tinkering and opted to release a working search page quickly. With Google Analytics now set up to track site search, the plan is keep identifying gaps and tweak the search settings continuously. This will take a while, as the number of daily users is currently 200-300, and they don’t all use site search.\nImplementing more search features is another set of items on my to-do list that will be addressed over time. For example, it’d be great to have search auto-completion and a prettier result page. However, I have more ideas than time to implement them, and I’m not working on BCRecommender full-time. For now, I’m pretty happy with finally having the search function.\nElasticsearch versus MongoDB: Key findings Comparisons between tools should always be taken with a grain of salt. General comparisons may not address features that are important for your specific use case, or may overemphasise aspects that you don’t care about. In addition, actively developed tools are moving targets. Since I started the transition to Elasticsearch, version 2.0 has been released, and MongoDB 3.2 is expected very soon. The following list is derived from my experience and may not apply to you. You have been warned!\nWith the disclaimer out of the way, here are some of the advantages of Elasticsearch over MongoDB:\nBetter full-text search support (duh!). Enforceable schemas and type validation (note: some form of optional schema is expected in MongoDB 3.2). All fields are indexed by default, making it easy to explore unstructured data without worrying about adding indices. It appears that indexing is implemented in a more efficient way that doesn’t block the node. Slowness due to indexing operations seems to be a common issue with MongoDB, even with background index creation. It’s possible to query multiple indices and types (same as MongoDB databases \u0026 collections, respectively) in the same query. This is a huge advantage in my case as it makes it possible to efficiently search both fans and tralbums in a single query. Index aliases make it easy to change the indices without changing the application. Multi-get by IDs returns results in the order they were requested. This is not the case with MongoDB, where using $in doesn’t have any guarantees on the returned documents’ order. It’s easy to work around this issue, but it can be the source of subtle bugs. In my case, recommendations were unintentionally sorted in random order until I added an additional step to sort them correctly. Built-in support for random scoring (note: random sampling will finally be available in MondoDB 3.2 – the ticket for this has been open for 5 years). Built-in support for multiple types of analysis on the same field. Some disadvantages of Elasticsearch in comparison to MongoDB are:\nAll fields are indexed by default, making it easy to run into memory issues. Adjusting these default settings is strongly recommended if you know how you’re going to query the data. Documents are immutable, so every update requires deleting the original document and re-inserting it (in practice, it seems like this isn’t much of an issue). Sorting results by a field requires reading all the field’s values and sorting them in memory. The sorted results are cached, but this may cause issues if memory is too limited. In conclusion, my experience with Elasticsearch has been mostly positive so far and I’m glad I’ve made the switch. I’m looking forward to taking further advantage of advanced search features to improve user experience on BCRecommender. New posts on the topic may be published in the future, so please subscribe to be notified when this happens. As always, I’m happy to receive feedback through the comments or privately.\n","wordCount":"2165","inLanguage":"en","image":"https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/mongodb-to-elasticsearch.png","datePublished":"2015-11-04T03:53:18Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Migrating a simple web application from MongoDB to Elasticsearch</h1><div class=post-meta><span title='2015-11-04 03:53:18 +0000 UTC'>November 4, 2015</span></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/mongodb-to-elasticsearch.png alt></figure><div class=post-content><p><a href=http://www.bcrecommender.com target=_blank rel=noopener>Bandcamp Recommender (BCRecommender)</a> is a web application that serves music recommendations from <a href=http://bandcamp.com target=_blank rel=noopener>Bandcamp</a>. I recently switched BCRecommender&rsquo;s data store from <a href=https://www.mongodb.com/ target=_blank rel=noopener>MongoDB</a> to <a href=https://www.elastic.co/products/elasticsearch target=_blank rel=noopener>Elasticsearch</a>. This has made it possible to offer a richer search experience to users at a similar cost. This post describes the migration process and discusses some of the advantages and disadvantages of using Elasticsearch instead of MongoDB.</p><h2 id=motivation-why-swap-mongodb-for-elasticsearch>Motivation: Why swap MongoDB for Elasticsearch?<a hidden class=anchor aria-hidden=true href=#motivation-why-swap-mongodb-for-elasticsearch>#</a></h2><p>I&rsquo;ve written a few posts in the past on BCRecommender&rsquo;s design and implementation. It is a fairly <a href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/>simple application with two main components</a>: the backend worker that crawls data and generates recommendations in batch, and the webapp that serves the recommendations. Importantly, each of these components has its own data store, with the recommendations synced up from the worker to the webapp, and data like events and subscriptions synced down from the webapp to the worker. Recently, I <a href=https://yanirseroussi.com/2015/07/31/goodbye-parse-com/>migrated the webapp component from Parse to DigitalOcean</a>, replacing Parse&rsquo;s data store with MongoDB. Choosing MongoDB was meant to simplify the transition – Parse uses MongoDB behind the scenes, as does the backend worker. However, moving out of Parse&rsquo;s sandboxed environment freed me to choose any data store, and Elasticsearch seemed like a good candidate that would make it possible to expose advanced search capabilities to end users.</p><p>Advanced search means different things to different people. In BCRecommender&rsquo;s case what I had in mind was rather modest, at least for the initial stages. BCRecommender presents recommendations for two types of entities: fans and tralbums (tracks/albums). In both cases, the recommended items are tralbums. <a href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/>When the key is a fan, the recommendations are tralbums that they may like, and when the key is a tralbum, the recommendations are similar tralbums</a>. Each tralbum has a title, an artist name, and a list of tags. Each fan has its Bandcamp username as a primary key, and a list of tags that is derived from the tralbums in the fan&rsquo;s collection. Originally, &ldquo;searching&rdquo; required users to either enter the exact username of a Bandcamp fan, or the exact Bandcamp link of a tralbum – not the best user experience! Indeed, I was tracking the search terms and found that many people were unsuccessfully trying to use unstructured queries. My idea of advanced search was to move away from the original key-value approach to full-text search that considers tags, titles, artists, and other fields that may get added later.</p><p>It was clear that while it may be possible to provide advanced search with MongoDB, it <a href=http://beletsky.net/2014/05/got-tired-of-mongodb-full-text.html target=_blank rel=noopener>wouldn&rsquo;t be a smooth ride</a>. While recent versions of MongoDB include support for full-text search, it isn&rsquo;t as feature-rich as Elasticsearch. For example, <a href=https://docs.mongodb.org/manual/core/index-text/#storage-requirements-and-performance-costs target=_blank rel=noopener>MongoDB text indices do not store phrases or information about the proximity of words in the documents</a>, making phrase queries run slowly unless the entire collection fits in memory. The names really say it all: MongoDB is a database with some search capabilities, and Elasticsearch is a search engine with some database capabilities. It <a href=https://www.compose.io/articles/mongoosastic-the-power-of-mongodb-and-elasticsearch-together/ target=_blank rel=noopener>seems pretty common</a> to use MongoDB (or another database) as a data store and supply search through Elasticsearch, so I figured it isn&rsquo;t a bad idea to apply this pattern to BCRecommender.</p><p>It is worth noting that if BCRecommender were a for-profit project, I would probably use <a href=https://www.algolia.com target=_blank rel=noopener>Algolia</a> rather than Elasticsearch. My experience with Algolia on a different project has been excellent – they make it easy for you to get started, have great customer service, and deliver good and fast results with minimal development and operational effort. The two main disadvantages of Algolia are its price and the fact that it&rsquo;s a closed-source solution (<a href=https://www.quora.com/How-does-Elasticsearch-relate-and-or-compare-to-Algolias-Search-as-a-Service target=_blank rel=noopener>see further discussion on Quora</a>). At over two million records, the monthly cost of running Algolia for BCRecommender would be around US$649, which is more than what I&rsquo;m willing to spend on this project. However, for a business this may be a reasonable cost because deploying and maintaining an Elasticsearch cluster may end up costing more. Nonetheless, many businesses use Elasticsearch successfully, which is why I have no doubt that it&rsquo;s a great choice for my use case – it just requires more work than Algolia to get up and running.</p><h2 id=executing-the-migration-plan>Executing the migration plan<a hidden class=anchor aria-hidden=true href=#executing-the-migration-plan>#</a></h2><p>The plan for migrating the webapp from MongoDB to Elasticsearch was pretty simple:</p><ol><li>Read the <a href=https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html target=_blank rel=noopener>Elasticsearch manual</a> to ensure it suits my needs</li><li>Replace MongoDB with Elasticsearch without making any user-facing changes</li><li>Expose full-text search to BCRecommender users</li><li>Improve search performance based on user behaviour</li><li>Implement more search features</li></ol><p><strong>Reading the manual</strong> is not something I do for every piece of technology I use (there are just too many tools out there these days), but for Elasticsearch it seemed to be worth the effort. I&rsquo;m not done reading yet, but covering the material in the <em>Getting Started</em> and <em>Search in Depth</em> sections gave me enough information to complete steps 2 & 3. The main things I was worried about was Elasticsearch&rsquo;s performance as a database and how memory-hungry it&rsquo;d be. Reading the manual allowed me to avoid some memory-use pitfalls and gave me insights on the way MongoDB and Elasticsearch compare (<a href=#es-vs-mongo>see details below</a>).</p><p><strong>Switching from MongoDB to Elasticsearch as a simple database</strong> was pretty straightforward. Both are document-based, so there were no changes required to the data models, but I did use the opportunity to fix some issues. For example, I changed the sitemap generation process from dynamic to static to avoid having to scroll through the entire dataset to fetch deep sitemap pages. To support BCRecommender&rsquo;s feature of browsing through random fans, I replaced <a href=http://bdadam.com/blog/finding-a-random-document-in-mongodb.html target=_blank rel=noopener>MongoDB&rsquo;s somewhat-hacky approach of returning random results</a> with <a href=https://www.elastic.co/guide/en/elasticsearch/guide/current/random-scoring.html target=_blank rel=noopener>Elasticsearch&rsquo;s cleaner method</a>. As the webapp is implemented in Python, I originally used the <a href=http://elasticsearch-dsl.readthedocs.org/en/latest/ target=_blank rel=noopener>elasticsearch-dsl package</a>, but found it too hard to debug queries (e.g., figuring out how to rank results randomly was a bit of a nightmare). Instead, I ended up using the <a href=http://elasticsearch-py.readthedocs.org/en/latest/ target=_blank rel=noopener>elasticsearch-py package</a>, which is only a thin wrapper around the Elasticsearch API. This approach yields code that doesn&rsquo;t look very Pythonic – rather than following the <a href=https://www.python.org/dev/peps/pep-0020/ target=_blank rel=noopener>Zen of Python&rsquo;s</a> <em>flat is better than nested</em> aphorism, the API follows the more Java-esque belief of <em>you can never have enough nesting</em> (see image below for example). However, I prefer overly-nested structures that I can debug to flat code that doesn&rsquo;t work. I may try using the DSL again in the future, once I&rsquo;ve gained more experience with Elasticsearch.</p><figure><a href=elasticsearch-is-nesty.png target=_blank rel=noopener><img sizes="(min-width: 768px) 528px,
 100vw" srcset="https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/elasticsearch-is-nesty_huc88b90071870452221fd99ee4be90e05_21660_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/elasticsearch-is-nesty_huc88b90071870452221fd99ee4be90e05_21660_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/elasticsearch-is-nesty.png 528w," src=https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/elasticsearch-is-nesty.png alt="elasticsearch is nesty" loading=lazy></a></figure><p>As mentioned, one of my worries was that I would have to increase the amount of memory allocated to the machine where Elasticsearch runs. Since BCRecommender is a fairly low-budget project, I&rsquo;m willing to sacrifice high availability to save a bit on operational costs. Therefore, the webapp and its data store run on the same DigitalOcean instance, which is enough to happily serve the current amount of traffic (around one request per second). By default, Elasticsearch indexes all the fields, and even includes an extra indexed _all field that is a concatenation of all string fields in a document. While indexing everything may be convenient, it wasn&rsquo;t necessary for the first stage. Choosing the minimal index settings allowed me to keep using the same instance size as before (1GB RAM and 30GB SSD). In fact, due to the switch to static sitemaps and the removal of MongoDB&rsquo;s random attribute hack, fewer indexes were required after the change.</p><p>Once I had all the code converted and working on my local Vagrant environment, it was time to deploy. The deployment was fairly straightforward and required no downtime, as I simply provisioned a new instance and switched over the floating IP once it was all tested and ready to go. I monitored response time and memory use closely and everything seemed to be working just fine – similarly to MongoDB. After a week of monitoring, it was time to take the next step and enable advanced search.</p><p><strong>Enabling full-text search</strong> is where things got interesting. This phase required adding a search result page (previously users were redirected to the queried page if it was found), and reindexing the data. For this phase, I tried to keep things as simple as possible, and just indexed the string fields (tags, artist, and title) using the standard analyser. I did some manual testing of search results based on common queries, and played a bit with improving <a href=https://en.wikipedia.org/wiki/Precision_and_recall target=_blank rel=noopener>precision and recall</a>. Perhaps the most important tweak was allowing an item&rsquo;s activity level to influence the ranking. For each tralbum, the activity level is the number of fans that have the tralbum in their collection, and for each fan, it is the size of the collection. For example, <a href="http://www.bcrecommender.com/search?q=amanda" target=_blank rel=noopener>when searching for <em>amanda</em></a>, the top result is the fan with username <em>amanda</em>, followed by tralbums by the popular Amanda Palmer. Before I added the consideration of activity level, all tralbums and fans that contained the word <em>amanda</em> had the same ranking.</p><figure><a href=bcrecommender-search-amanda.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
diff --git a/2015/11/23/the-hardest-parts-of-data-science/index.html b/2015/11/23/the-hardest-parts-of-data-science/index.html
index 87d711844..4b79c6f4a 100644
--- a/2015/11/23/the-hardest-parts-of-data-science/index.html
+++ b/2015/11/23/the-hardest-parts-of-data-science/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>The hardest parts of data science | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="climate change,data science,Kaggle,predictive modelling,science communication"><meta name=description content="Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The hardest parts of data science"><meta property="og:description" content="Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/"><meta property="og:image" content="https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-11-23T04:14:21+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest.jpg"><meta name=twitter:title content="The hardest parts of data science"><meta name=twitter:description content="Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"The hardest parts of data science","item":"https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The hardest parts of data science","name":"The hardest parts of data science","description":"Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.","keywords":["climate change","data science","Kaggle","predictive modelling","science communication"],"articleBody":"Contrary to common belief, the hardest part of data science isn’t building an accurate model or obtaining good, clean data. It is much harder to define feasible problems and come up with reasonable ways of measuring solutions. This post discusses some examples of these issues and how they can be addressed.\nThe not-so-hard parts Before discussing the hardest parts of data science, it’s worth quickly addressing the two main contenders: model fitting and data collection/cleaning.\nModel fitting is seen by some as particularly hard, or as real data science. This belief is fuelled in part by the success of Kaggle, that calls itself the home of data science. Most Kaggle competitions are focused on model fitting: Participants are given a well-defined problem, a dataset, and a measure to optimise, and they compete to produce the most accurate model. Coupling Kaggle’s excellent marketing with their competition setup leads many people to believe that data science is all about fitting models. In reality, building reasonably-accurate models is not that hard, because many model-building phases can easily be automated. Indeed, there are many companies that offer model fitting as a service (e.g., Microsoft, Amazon, Google and others). Even Ben Hamner, CTO of Kaggle, has said that he is “surprised at the number of ‘black box machine learning in the cloud’ services emerging: model fitting is easy. Problem definition and data collection are not.”\nData collection/cleaning is the essential part that everyone loves to hate. DJ Patil (US Chief Data Scientist) is quoted as saying that “the hardest part of data science is getting good, clean data. Cleaning data is often 80% of the work.” While I agree that collecting data and cleaning it can be a lot of work, I don’t think of this part as particularly hard. It’s definitely important and may require careful planning, but in many cases it just isn’t very challenging. In addition, it is often the case that the data is already given, or is collected using previously-developed methods.\nProblem definition is hard There are many reasons why problem definition can be hard. It is sometimes due to stakeholders who don’t know what they want, and expect data scientists to solve all their data problems (either real or imagined). This type of situation is summarised by the following Dilbert strip. It is best handled by cleverly managing stakeholder expectations, while stirring them towards better-defined problems.\nWell-defined problems are great, for the obvious reason that they can actually be addressed. Examples of such problems include:\nBuild a model to predict the sales of a marketing campaign Create a system that runs campaigns that automatically adapt to customer feedback Identify key objects in images Improve click-through rates on search engine results, ads, or any other element Detect whale calls from underwater recordings to prevent collisions Often, it can be hard to get to the stage where the problem is agreed on, because this requires dealing with people who only have a fuzzy idea of what can be done with data science. Dilbertian situations aside, these people often have real problems that they care about, so exploring the core issues with them is time well-spent.\nSolution measurement is often harder than problem definition Many problems that actually matter have solutions that are really hard to measure. For example, improving the well-being of the population (e.g., a company’s customers or a country’s citizens) is an overarching problem that arises in many situations. However, this problem gives rise to the hard question of how well-being can be measured and aggregated. The following paragraphs discuss issues that occur in solution measurement, often making it the hardest part of data science.\nIdeally, we would always be able to run randomised controlled trials to measure treatment effects. However, the reality is that experimental data is often censored, there many constraints on running experiments (ethics, practicality, budget, etc.), and confounding factors may make it impossible to identify the true causal impact of interventions. These issues seriously influence many aspects of our lives. I’ve written a post on how these issues manifest themselves in research on the connection between nutrition and our health. Here, I’ll discuss another major example: the health effects of smoking and anthropogenic climate change.\nWhile smoking and anthropogenic climate change may seem unrelated, they actually have a lot in common. In both cases it is hard (or impossible) to perform experiments to determine causality, and in both cases this fact has been used to mislead the public by parties with commercial and ideological interests. In the case of smoking, due to ethical reasons, one can’t perform an experiment where a random control group is forced not to smoke, while a treatment group is forced to smoke. Further, since it can take many years for smoking-caused diseases to develop, it’d take a long time to obtain the results of such an experiment. Tobacco companies have exploited this fact for years, claiming that there may be some genetic factor that causes both smoking and a higher susceptibility to smoking-related diseases. Fortunately, we live in a world where these claims have been widely discredited, and it is now clear to most people that smoking is harmful. However, similar doubt-casting techniques are used by polluters and their supporters in the debate on anthropogenic climate change. While no serious climate scientist doubts the fact that human activities are causing climate change, this can’t be proved through experimentation on another Earth. In both cases, the answers should be clear when looking at the evidence and the mechanisms at play without an ideological bias. It doesn’t take a scientist to figure out that pumping your lungs full of smoke on a regular basis is likely to be harmful, as is pumping the atmosphere full of greenhouse gases that have been sequestered for millions of years. However, as said by Upton Sinclair, “it is difficult to get a man to understand something, when his salary depends upon his not understanding it.”\nAssuming that we have addressed the issues raised so far, there is the matter of choosing a measure or metric of success. How do we know that our solution works well? A common approach is to choose a single metric to focus on, such as increasing conversion rates. However, all metrics have their flaws, and there are quite a few problems with metric selection and its maintenance over time.\nFirst, focusing on a single metric can be harmful, because no metric is perfect. A classic example of this issue is the focus on growing the economy, as measured by gross domestic product (GDP). The article What is up with the GDP? by Frank Shostak summarises some of the problems with GDP:\nThe GDP framework cannot tell us whether final goods and services that were produced during a particular period of time are a reflection of real wealth expansion, or a reflection of capital consumption.\nFor instance, if a government embarks on the building of a pyramid, which adds absolutely nothing to the well-being of individuals, the GDP framework will regard this as economic growth. In reality, however, the building of the pyramid will divert real funding from wealth-generating activities, thereby stifling the production of wealth.\n[…]\nThe whole idea of GDP gives the impression that there is such a thing as the national output. In the real world, however, wealth is produced by someone and belongs to somebody. In other words, goods and services are not produced in totality and supervised by one supreme leader. This in turn means that the entire concept of GDP is devoid of any basis in reality. It is an empty concept.\nShostak’s criticism comes from a right-winged viewpoint – his argument is that the GDP is used as an excuse for unnecessary government intervention with the market. However, the focus on GDP growth is also heavily-criticised by the left due to the fact that it doesn’t consider environmental effects and inequalities in the distribution of wealth. It is a bit odd that GDP growth is still considered a worthwhile goal by many people, given that it can easily be skewed by a few powerful individuals who choose to build unnecessary pyramids (though perhaps this is the real reason why the GDP persists – wealthy individuals have an interest in keeping it this way).\nEven if we decide to use multiple metrics to evaluate our solution, our troubles aren’t over yet. Using multiple metrics often means that there are trade-offs between the different metrics. For example, with the precision and recall measures that are commonly used to evaluate the performance of search engines, it is rare to be able to increase both precision and recall at the same time. Precision is the percentage of relevant items out of those that have been returned, while recall is the percentage of relevant items that have been returned out of the overall number of relevant items. Hence, it is easy to artificially increase recall to 100% by always returning all the items in the database, but this would mean settling for near-zero precision. Similarly, one can increase precision by always returning a single item that the algorithm is very confident about, but this means that recall would suffer. Ultimately, the best balance between precision and recall depends on the application.\nAnother issue with choosing metrics is the impossibility of reliably evaluating our choices. This is summarised well by Scott Berkun in his book The Year Without Pants:\nAll metrics create temptations. Even with great intentions and smart minds, data runs you faster and faster into a stupid self-destructive circle. Data can’t decide things for you. It can help you see things more clearly if captured carefully, but that’s not the same as deciding. Just as there is an advice paradox, there is a data paradox: no matter how much data you have, you still depend on your intuition for deciding how to interpret and then apply the data.\nPut another way, there is no good KPI for measuring KPIs. There are no good metrics for evaluating metrics (or for evaluating metrics for evaluating metrics for evaluating metrics, and on it goes).\nOK, so we’ve picked some flawed measures that we can’t really evaluate, and we’ve accepted the imperfections of the evaluation process. Are we done yet? No. There’s still the small matter of Goodhart’s Law, which states that “when a measure becomes a target, it ceases to be a good measure.” This is often the case because people will tend to manipulate results and game the system (not necessarily maliciously) in order to hit measured goals. However, even without manipulation and gaming, we often deal with moving targets. Just because the measure we’ve chosen is suitable today, it doesn’t mean it will still be relevant in a few months or years because reality changes. For example, in the 1990s, the number of page views was a good measure of interaction with websites, but nowadays it is a pretty weak measure because many websites are single-page applications. Reality changes and so should our problems, solutions, measures, and goals.\nEmbracing ambiguity and uncertainty Personally, I find the complexities of measurement and problem definition quite interesting. However, many people aren’t that interested in this stuff – they just want working solutions and simple stories. As demonstrated by the examples throughout this article, over-simplification of complicated matters is a pervasive issue that goes beyond what’s commonly considered “data science”. This is why storytelling is seen as a key skill that data scientists should possess. I believe it’s also important to maintain one’s integrity and not just make up stories that people would buy, but it’d be naive to assume that this never happens. Either way, good data scientists embrace uncertainty and ambiguity, but can still tell a simple story if needed.\nNote: The ideas in this post were first presented at The Sydney Data Science Breakfast Meetup Group. The slides for that talk are available here.\n","wordCount":"1983","inLanguage":"en","image":"https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest.jpg","datePublished":"2015-11-23T04:14:21Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">The hardest parts of data science</h1><div class=post-meta><span title='2015-11-23 04:14:21 +0000 UTC'>November 23, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest_hub79e18ea0439364131cf541e77991fbc_191404_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest_hub79e18ea0439364131cf541e77991fbc_191404_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest_hub79e18ea0439364131cf541e77991fbc_191404_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest_hub79e18ea0439364131cf541e77991fbc_191404_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest_hub79e18ea0439364131cf541e77991fbc_191404_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest.jpg 1960w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest.jpg alt width=1960 height=597></figure><div class=post-content><p>Contrary to common belief, the hardest part of data science isn&rsquo;t building an accurate model or obtaining good, clean data. It is much harder to define feasible problems and come up with reasonable ways of measuring solutions. This post discusses some examples of these issues and how they can be addressed.</p><h2 id=the-not-so-hard-parts>The not-so-hard parts<a hidden class=anchor aria-hidden=true href=#the-not-so-hard-parts>#</a></h2><p>Before discussing the hardest parts of data science, it&rsquo;s worth quickly addressing the two main contenders: model fitting and data collection/cleaning.</p><p><strong>Model fitting</strong> is seen by some as particularly hard, or as <em>real</em> data science. This belief is fuelled in part by the success of <a href=https://www.kaggle.com/ target=_blank rel=noopener>Kaggle</a>, that calls itself <em>the home of data science</em>. Most Kaggle competitions are focused on model fitting: Participants are given a well-defined problem, a dataset, and a measure to optimise, and they compete to produce the most accurate model. Coupling Kaggle&rsquo;s excellent marketing with their competition setup leads many people to believe that data science is all about fitting models. In reality, building reasonably-accurate models is not that hard, because many model-building phases can easily be automated. Indeed, there are many companies that offer model fitting as a service (e.g., Microsoft, Amazon, Google and <a href=http://www.shivonzilis.com/machineintelligence target=_blank rel=noopener>others</a>). Even Ben Hamner, CTO of Kaggle, has said that he is &ldquo;surprised at the number of &lsquo;black box machine learning in the cloud&rsquo; services emerging: model fitting is easy. Problem definition and data collection are not.&rdquo;</p><figure><a href=https://twitter.com/benhamner/status/595850574999990274 target=_blank rel=noopener><img sizes="(min-width: 768px) 569px,
+<meta name=keywords content="climate change,data science,Kaggle,predictive modelling,science communication"><meta name=description content="Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The hardest parts of data science"><meta property="og:description" content="Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/"><meta property="og:image" content="https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-11-23T04:14:21+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest.jpg"><meta name=twitter:title content="The hardest parts of data science"><meta name=twitter:description content="Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"The hardest parts of data science","item":"https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The hardest parts of data science","name":"The hardest parts of data science","description":"Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.","keywords":["climate change","data science","Kaggle","predictive modelling","science communication"],"articleBody":"Contrary to common belief, the hardest part of data science isn’t building an accurate model or obtaining good, clean data. It is much harder to define feasible problems and come up with reasonable ways of measuring solutions. This post discusses some examples of these issues and how they can be addressed.\nThe not-so-hard parts Before discussing the hardest parts of data science, it’s worth quickly addressing the two main contenders: model fitting and data collection/cleaning.\nModel fitting is seen by some as particularly hard, or as real data science. This belief is fuelled in part by the success of Kaggle, that calls itself the home of data science. Most Kaggle competitions are focused on model fitting: Participants are given a well-defined problem, a dataset, and a measure to optimise, and they compete to produce the most accurate model. Coupling Kaggle’s excellent marketing with their competition setup leads many people to believe that data science is all about fitting models. In reality, building reasonably-accurate models is not that hard, because many model-building phases can easily be automated. Indeed, there are many companies that offer model fitting as a service (e.g., Microsoft, Amazon, Google and others). Even Ben Hamner, CTO of Kaggle, has said that he is “surprised at the number of ‘black box machine learning in the cloud’ services emerging: model fitting is easy. Problem definition and data collection are not.”\nData collection/cleaning is the essential part that everyone loves to hate. DJ Patil (US Chief Data Scientist) is quoted as saying that “the hardest part of data science is getting good, clean data. Cleaning data is often 80% of the work.” While I agree that collecting data and cleaning it can be a lot of work, I don’t think of this part as particularly hard. It’s definitely important and may require careful planning, but in many cases it just isn’t very challenging. In addition, it is often the case that the data is already given, or is collected using previously-developed methods.\nProblem definition is hard There are many reasons why problem definition can be hard. It is sometimes due to stakeholders who don’t know what they want, and expect data scientists to solve all their data problems (either real or imagined). This type of situation is summarised by the following Dilbert strip. It is best handled by cleverly managing stakeholder expectations, while stirring them towards better-defined problems.\nWell-defined problems are great, for the obvious reason that they can actually be addressed. Examples of such problems include:\nBuild a model to predict the sales of a marketing campaign Create a system that runs campaigns that automatically adapt to customer feedback Identify key objects in images Improve click-through rates on search engine results, ads, or any other element Detect whale calls from underwater recordings to prevent collisions Often, it can be hard to get to the stage where the problem is agreed on, because this requires dealing with people who only have a fuzzy idea of what can be done with data science. Dilbertian situations aside, these people often have real problems that they care about, so exploring the core issues with them is time well-spent.\nSolution measurement is often harder than problem definition Many problems that actually matter have solutions that are really hard to measure. For example, improving the well-being of the population (e.g., a company’s customers or a country’s citizens) is an overarching problem that arises in many situations. However, this problem gives rise to the hard question of how well-being can be measured and aggregated. The following paragraphs discuss issues that occur in solution measurement, often making it the hardest part of data science.\nIdeally, we would always be able to run randomised controlled trials to measure treatment effects. However, the reality is that experimental data is often censored, there many constraints on running experiments (ethics, practicality, budget, etc.), and confounding factors may make it impossible to identify the true causal impact of interventions. These issues seriously influence many aspects of our lives. I’ve written a post on how these issues manifest themselves in research on the connection between nutrition and our health. Here, I’ll discuss another major example: the health effects of smoking and anthropogenic climate change.\nWhile smoking and anthropogenic climate change may seem unrelated, they actually have a lot in common. In both cases it is hard (or impossible) to perform experiments to determine causality, and in both cases this fact has been used to mislead the public by parties with commercial and ideological interests. In the case of smoking, due to ethical reasons, one can’t perform an experiment where a random control group is forced not to smoke, while a treatment group is forced to smoke. Further, since it can take many years for smoking-caused diseases to develop, it’d take a long time to obtain the results of such an experiment. Tobacco companies have exploited this fact for years, claiming that there may be some genetic factor that causes both smoking and a higher susceptibility to smoking-related diseases. Fortunately, we live in a world where these claims have been widely discredited, and it is now clear to most people that smoking is harmful. However, similar doubt-casting techniques are used by polluters and their supporters in the debate on anthropogenic climate change. While no serious climate scientist doubts the fact that human activities are causing climate change, this can’t be proved through experimentation on another Earth. In both cases, the answers should be clear when looking at the evidence and the mechanisms at play without an ideological bias. It doesn’t take a scientist to figure out that pumping your lungs full of smoke on a regular basis is likely to be harmful, as is pumping the atmosphere full of greenhouse gases that have been sequestered for millions of years. However, as said by Upton Sinclair, “it is difficult to get a man to understand something, when his salary depends upon his not understanding it.”\nAssuming that we have addressed the issues raised so far, there is the matter of choosing a measure or metric of success. How do we know that our solution works well? A common approach is to choose a single metric to focus on, such as increasing conversion rates. However, all metrics have their flaws, and there are quite a few problems with metric selection and its maintenance over time.\nFirst, focusing on a single metric can be harmful, because no metric is perfect. A classic example of this issue is the focus on growing the economy, as measured by gross domestic product (GDP). The article What is up with the GDP? by Frank Shostak summarises some of the problems with GDP:\nThe GDP framework cannot tell us whether final goods and services that were produced during a particular period of time are a reflection of real wealth expansion, or a reflection of capital consumption.\nFor instance, if a government embarks on the building of a pyramid, which adds absolutely nothing to the well-being of individuals, the GDP framework will regard this as economic growth. In reality, however, the building of the pyramid will divert real funding from wealth-generating activities, thereby stifling the production of wealth.\n[…]\nThe whole idea of GDP gives the impression that there is such a thing as the national output. In the real world, however, wealth is produced by someone and belongs to somebody. In other words, goods and services are not produced in totality and supervised by one supreme leader. This in turn means that the entire concept of GDP is devoid of any basis in reality. It is an empty concept.\nShostak’s criticism comes from a right-winged viewpoint – his argument is that the GDP is used as an excuse for unnecessary government intervention with the market. However, the focus on GDP growth is also heavily-criticised by the left due to the fact that it doesn’t consider environmental effects and inequalities in the distribution of wealth. It is a bit odd that GDP growth is still considered a worthwhile goal by many people, given that it can easily be skewed by a few powerful individuals who choose to build unnecessary pyramids (though perhaps this is the real reason why the GDP persists – wealthy individuals have an interest in keeping it this way).\nEven if we decide to use multiple metrics to evaluate our solution, our troubles aren’t over yet. Using multiple metrics often means that there are trade-offs between the different metrics. For example, with the precision and recall measures that are commonly used to evaluate the performance of search engines, it is rare to be able to increase both precision and recall at the same time. Precision is the percentage of relevant items out of those that have been returned, while recall is the percentage of relevant items that have been returned out of the overall number of relevant items. Hence, it is easy to artificially increase recall to 100% by always returning all the items in the database, but this would mean settling for near-zero precision. Similarly, one can increase precision by always returning a single item that the algorithm is very confident about, but this means that recall would suffer. Ultimately, the best balance between precision and recall depends on the application.\nAnother issue with choosing metrics is the impossibility of reliably evaluating our choices. This is summarised well by Scott Berkun in his book The Year Without Pants:\nAll metrics create temptations. Even with great intentions and smart minds, data runs you faster and faster into a stupid self-destructive circle. Data can’t decide things for you. It can help you see things more clearly if captured carefully, but that’s not the same as deciding. Just as there is an advice paradox, there is a data paradox: no matter how much data you have, you still depend on your intuition for deciding how to interpret and then apply the data.\nPut another way, there is no good KPI for measuring KPIs. There are no good metrics for evaluating metrics (or for evaluating metrics for evaluating metrics for evaluating metrics, and on it goes).\nOK, so we’ve picked some flawed measures that we can’t really evaluate, and we’ve accepted the imperfections of the evaluation process. Are we done yet? No. There’s still the small matter of Goodhart’s Law, which states that “when a measure becomes a target, it ceases to be a good measure.” This is often the case because people will tend to manipulate results and game the system (not necessarily maliciously) in order to hit measured goals. However, even without manipulation and gaming, we often deal with moving targets. Just because the measure we’ve chosen is suitable today, it doesn’t mean it will still be relevant in a few months or years because reality changes. For example, in the 1990s, the number of page views was a good measure of interaction with websites, but nowadays it is a pretty weak measure because many websites are single-page applications. Reality changes and so should our problems, solutions, measures, and goals.\nEmbracing ambiguity and uncertainty Personally, I find the complexities of measurement and problem definition quite interesting. However, many people aren’t that interested in this stuff – they just want working solutions and simple stories. As demonstrated by the examples throughout this article, over-simplification of complicated matters is a pervasive issue that goes beyond what’s commonly considered “data science”. This is why storytelling is seen as a key skill that data scientists should possess. I believe it’s also important to maintain one’s integrity and not just make up stories that people would buy, but it’d be naive to assume that this never happens. Either way, good data scientists embrace uncertainty and ambiguity, but can still tell a simple story if needed.\nNote: The ideas in this post were first presented at The Sydney Data Science Breakfast Meetup Group. The slides for that talk are available here.\n","wordCount":"1983","inLanguage":"en","image":"https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest.jpg","datePublished":"2015-11-23T04:14:21Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">The hardest parts of data science</h1><div class=post-meta><span title='2015-11-23 04:14:21 +0000 UTC'>November 23, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest_hub79e18ea0439364131cf541e77991fbc_191404_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest_hub79e18ea0439364131cf541e77991fbc_191404_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest_hub79e18ea0439364131cf541e77991fbc_191404_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest_hub79e18ea0439364131cf541e77991fbc_191404_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest_hub79e18ea0439364131cf541e77991fbc_191404_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest.jpg 1960w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/foggy-random-forest.jpg alt width=1960 height=597></figure><div class=post-content><p>Contrary to common belief, the hardest part of data science isn&rsquo;t building an accurate model or obtaining good, clean data. It is much harder to define feasible problems and come up with reasonable ways of measuring solutions. This post discusses some examples of these issues and how they can be addressed.</p><h2 id=the-not-so-hard-parts>The not-so-hard parts<a hidden class=anchor aria-hidden=true href=#the-not-so-hard-parts>#</a></h2><p>Before discussing the hardest parts of data science, it&rsquo;s worth quickly addressing the two main contenders: model fitting and data collection/cleaning.</p><p><strong>Model fitting</strong> is seen by some as particularly hard, or as <em>real</em> data science. This belief is fuelled in part by the success of <a href=https://www.kaggle.com/ target=_blank rel=noopener>Kaggle</a>, that calls itself <em>the home of data science</em>. Most Kaggle competitions are focused on model fitting: Participants are given a well-defined problem, a dataset, and a measure to optimise, and they compete to produce the most accurate model. Coupling Kaggle&rsquo;s excellent marketing with their competition setup leads many people to believe that data science is all about fitting models. In reality, building reasonably-accurate models is not that hard, because many model-building phases can easily be automated. Indeed, there are many companies that offer model fitting as a service (e.g., Microsoft, Amazon, Google and <a href=http://www.shivonzilis.com/machineintelligence target=_blank rel=noopener>others</a>). Even Ben Hamner, CTO of Kaggle, has said that he is &ldquo;surprised at the number of &lsquo;black box machine learning in the cloud&rsquo; services emerging: model fitting is easy. Problem definition and data collection are not.&rdquo;</p><figure><a href=https://twitter.com/benhamner/status/595850574999990274 target=_blank rel=noopener><img sizes="(min-width: 768px) 569px,
 100vw" srcset="https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/ben-hamner-black-box-ml_hue38d7e4cb07e1ecfcf4351af67252791_46703_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/ben-hamner-black-box-ml_hue38d7e4cb07e1ecfcf4351af67252791_46703_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/ben-hamner-black-box-ml.png 569w," src=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/ben-hamner-black-box-ml.png alt="Ben Hamner tweet on black box ML in the cloud" loading=lazy></a></figure><p><strong>Data collection/cleaning</strong> is the essential part that everyone loves to hate. DJ Patil (US Chief Data Scientist) is <a href=http://codingvc.com/talk-summary-building-great-data-products target=_blank rel=noopener>quoted as saying</a> that &ldquo;the hardest part of data science is getting good, clean data. Cleaning data is often 80% of the work.&rdquo; While I agree that collecting data and cleaning it can be a lot of work, I don&rsquo;t think of this part as particularly hard. It&rsquo;s definitely important and may require careful planning, but in many cases it just isn&rsquo;t very challenging. In addition, it is often the case that the data is already given, or is collected using previously-developed methods.</p><h2 id=problem-definition-is-hard>Problem definition is hard<a hidden class=anchor aria-hidden=true href=#problem-definition-is-hard>#</a></h2><p>There are many reasons why problem definition can be hard. It is sometimes due to stakeholders who don&rsquo;t know what they want, and <a href=https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/>expect data scientists to solve all their data problems (either real or imagined)</a>. This type of situation is summarised by <a href=http://dilbert.com/strip/2012-07-29 target=_blank rel=noopener>the following Dilbert strip</a>. It is best handled by cleverly managing stakeholder expectations, while stirring them towards better-defined problems.</p><figure><a href=dilbert-big-data.jpg target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
diff --git a/2015/12/08/this-holiday-season-give-me-real-insights/index.html b/2015/12/08/this-holiday-season-give-me-real-insights/index.html
index a78ef0cb9..7cd355023 100644
--- a/2015/12/08/this-holiday-season-give-me-real-insights/index.html
+++ b/2015/12/08/this-holiday-season-give-me-real-insights/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>This holiday season, give me real insights | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="analytics,data science,Facebook,insights,LinkedIn,marketing,WordPress"><meta name=description content="Some companies present raw data or information as &ldquo;insights&rdquo;. This post surveys some examples, and discusses how they can be turned into real insights."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="This holiday season, give me real insights"><meta property="og:description" content="Some companies present raw data or information as &ldquo;insights&rdquo;. This post surveys some examples, and discusses how they can be turned into real insights."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/"><meta property="og:image" content="https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/dikw-pyramid.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-12-08T06:57:25+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/dikw-pyramid.jpg"><meta name=twitter:title content="This holiday season, give me real insights"><meta name=twitter:description content="Some companies present raw data or information as &ldquo;insights&rdquo;. This post surveys some examples, and discusses how they can be turned into real insights."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"This holiday season, give me real insights","item":"https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"This holiday season, give me real insights","name":"This holiday season, give me real insights","description":"Some companies present raw data or information as \u0026ldquo;insights\u0026rdquo;. This post surveys some examples, and discusses how they can be turned into real insights.","keywords":["analytics","data science","Facebook","insights","LinkedIn","marketing","WordPress"],"articleBody":"Merriam-Webster defines an insight as an understanding of the true nature of something. Many companies seem to define an insight as any piece of data or information, which I would call a pseudo-insight. This post surveys some examples of pseudo-insights, and discusses how these can be built upon to provide real insights.\nExhibit A: WordPress stats This website is hosted on wordpress.com. I’m generally happy with WordPress – though it’s not as exciting and shiny as newer competitors, it is rock-solid and very feature-rich. An example of a great WordPress feature is the new stats area (available under wordpress.com/stats if you have a WordPress website). This area includes an insights page, which is full of prime examples of pseudo-insights.\nAt the top of the insights page, there is a visualisation of posting activity. As the image below shows, this isn’t very interesting for websites like mine. I already know that I post irregularly, because writing a blog post is time-consuming. I suspect that this visualisation isn’t very useful even for more active multi-author blogs, as it is essentially just a different way of displaying the raw data of post dates. Without joining this data with other information, we won’t gain a better understanding of how the blog is performing and why it performs the way it does.\nAn attempt to extract more meaningful insights from posting times appears further down the page, in the form of a widget that tells you the most popular day and hour. The help text says that This is the day and hour when you have been getting the most Views on average. The best timing for publishing a post may be around this period. Unfortunately, I’m pretty certain that this isn’t true in my case. Monday happens to be the most popular day because that’s when I published two of my most popular posts, and I usually try to spread the word about a new post as soon as I publish it. Further, blog posts can become popular a long time after publication, so it is unlikely that the best timing for publishing a post is around Monday 3pm.\nWhat would real WordPress insights look like? If we stick to idea of exploring the effect of publication timing, I would be curious to know if there is indeed a link between when a post is published and its popularity. Automattic (the company behind WordPress) is in a position to test this, as they can explore data from millions of blogs. My gut feeling is that the time of publication has a negligible effect on popularity. Things that matter much more are a post’s title, content, and effective distribution channels. Given the amount of data that they have, Automattic data scientists can definitely explore all of these factors. This would allow them to surface insights that will help authors drive more quality traffic to their websites.\nExhibit B: Facebook page insights As anyone who manages a Facebook page probably knows, Facebook provides pretty rich analytics of pages on their platform. For example, you can see the likes you’ve received over time and how your posts perform, and slice and dice this information in various ways. This is a great feature, but again, calling it insights is a misuse of the word and somewhat of an insult for those of us who work to extract real insights from data. An analytics dashboard is not insights.\nWhat would real Facebook page insights look like? Working off the assumption that people manage a Facebook page to reach and engage their audience, real insights would enhance a page administrator’s understanding of their audience and improve their ability to engage them and reach new people. However, Facebook is famous for having a conflict of interest here, because they require you to pay to reach more people. For example, if a post you shared is performing better than usual, Facebook will send you a notification, asking you to pay to boost the post further. It would be better if they told you what has caused this post to reach more people, and how to reproduce this success with future posts (for free). But this is very unlikely to happen. In the words of CGP Grey: professional sharers cannot trust the platforms upon which they stand, audiences cannot trust the platform to show what they asked to see.\nExhibit C: LinkedIn profile views Who’s viewed your profile is a popular LinkedIn feature. A key part of this feature is a graph that includes your weekly profile views together with actions taken on LinkedIn. The official LinkedIn blog calls this graph the insights graph and provides some examples for its uses:\nSo, for example, if you are trying to attract new clients or business leads, you can see how many potential partners looked at your profile after you joined an important industry group. Or, if you’re looking for a new job, you can look at your insights graph to see whether adding a skill to your profile or endorsing a peer gave you a bigger bump in views by recruiters. No matter your goal, you’ll be able to see which actions lead to the most relevant profile views – then start reaching out and closing the sale or applying for your dream job.\nAs the examples show, the so-called insights graph merely provides information about past actions and profile views on the LinkedIn platform. It is up to you to come up with the insights, but this may be hard if you consider only the actions taken within the walled garden of LinkedIn. For example, as shown in the following graph, my profile views received a boost on the week starting November 23, which was mostly due to publishing a popular post on this website. In general, social networks such as LinkedIn, Twitter, and Facebook tend to have a very narrow view of the world – as if the only interesting things happen on the platform. In reality, most of the action happens off-platform, either within other digital assets or in the physical world.\nWhat would real LinkedIn insights look like? First, I think that the focus on profile views is somewhat misguided. It’s not that hard to artificially generate profile views – simply view other people’s profiles. There is no intrinsic value in someone having viewed your profile – the value comes from a connection that leads to an interesting offer or conversation. Second, LinkedIn is about professional networking that is based on real-world activity. As such, it only forms a small part of the world of professional networking by allowing people to have an online presence that makes them contactable by people they don’t already know. When it comes to insights, it’d be useful to know the true causal factors that lead to interesting connections – much more useful than suggestions such as add software development as a skill on your profile to get up to 3% more profile views.\nSummary: Real insights are about the why There are many other examples of pseudo-insights out there. The reason is probably that the field of analytics is becoming increasingly commoditised, and it is easier to rebrand an analytics dashboard as an insights dashboard than to provide real insights. Providing real insights requires moving up the DIKW pyramid from data and information to knowledge and wisdom – from describing the past to learning general lessons that allow you to influence the future. Providing real insights can be very hard, as it often requires inferring the causes of events – the why that comes after the what and how. More on this later – I have just started reading Samantha Kleinberg’s Why: A Guide to Finding and Using Causes and will report (hopefully real) insights on causality in future posts.\n","wordCount":"1296","inLanguage":"en","image":"https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/dikw-pyramid.jpg","datePublished":"2015-12-08T06:57:25Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">This holiday season, give me real insights</h1><div class=post-meta><span title='2015-12-08 06:57:25 +0000 UTC'>December 8, 2015</span></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/dikw-pyramid.jpg alt></figure><div class=post-content><p>Merriam-Webster defines an <a href=http://www.merriam-webster.com/dictionary/insight target=_blank rel=noopener>insight</a> as <em>an understanding of the true nature of something</em>. Many companies seem to define an insight as <em>any piece of data or information</em>, which I would call a pseudo-insight. This post surveys some examples of pseudo-insights, and discusses how these can be built upon to provide real insights.</p><h2 id=exhibit-a-wordpress-stats>Exhibit A: WordPress stats<a hidden class=anchor aria-hidden=true href=#exhibit-a-wordpress-stats>#</a></h2><p>This website is hosted on <a href=http://wordpress.com target=_blank rel=noopener>wordpress.com</a>. I&rsquo;m generally happy with WordPress – though it&rsquo;s not as exciting and shiny as newer competitors, it is rock-solid and very feature-rich. An example of a great WordPress feature is the new stats area (available under <a href=https://wordpress.com/stats target=_blank rel=noopener>wordpress.com/stats</a> if you have a WordPress website). This area includes an insights page, which is full of prime examples of pseudo-insights.</p><p>At the top of the insights page, there is a visualisation of posting activity. As the image below shows, this isn&rsquo;t very interesting for websites like mine. I already know that I post irregularly, because writing a blog post is time-consuming. I suspect that this visualisation isn&rsquo;t very useful even for more active multi-author blogs, as it is essentially just a different way of displaying the raw data of post dates. Without joining this data with other information, we won&rsquo;t gain a better understanding of how the blog is performing and why it performs the way it does.</p><figure><a href=wordpress-insights-posting-activity.png target=_blank rel=noopener><img sizes="(min-width: 768px) 713px,
+<meta name=keywords content="analytics,data science,Facebook,insights,LinkedIn,marketing,WordPress"><meta name=description content="Some companies present raw data or information as &ldquo;insights&rdquo;. This post surveys some examples, and discusses how they can be turned into real insights."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="This holiday season, give me real insights"><meta property="og:description" content="Some companies present raw data or information as &ldquo;insights&rdquo;. This post surveys some examples, and discusses how they can be turned into real insights."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/"><meta property="og:image" content="https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/dikw-pyramid.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-12-08T06:57:25+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/dikw-pyramid.jpg"><meta name=twitter:title content="This holiday season, give me real insights"><meta name=twitter:description content="Some companies present raw data or information as &ldquo;insights&rdquo;. This post surveys some examples, and discusses how they can be turned into real insights."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"This holiday season, give me real insights","item":"https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"This holiday season, give me real insights","name":"This holiday season, give me real insights","description":"Some companies present raw data or information as \u0026ldquo;insights\u0026rdquo;. This post surveys some examples, and discusses how they can be turned into real insights.","keywords":["analytics","data science","Facebook","insights","LinkedIn","marketing","WordPress"],"articleBody":"Merriam-Webster defines an insight as an understanding of the true nature of something. Many companies seem to define an insight as any piece of data or information, which I would call a pseudo-insight. This post surveys some examples of pseudo-insights, and discusses how these can be built upon to provide real insights.\nExhibit A: WordPress stats This website is hosted on wordpress.com. I’m generally happy with WordPress – though it’s not as exciting and shiny as newer competitors, it is rock-solid and very feature-rich. An example of a great WordPress feature is the new stats area (available under wordpress.com/stats if you have a WordPress website). This area includes an insights page, which is full of prime examples of pseudo-insights.\nAt the top of the insights page, there is a visualisation of posting activity. As the image below shows, this isn’t very interesting for websites like mine. I already know that I post irregularly, because writing a blog post is time-consuming. I suspect that this visualisation isn’t very useful even for more active multi-author blogs, as it is essentially just a different way of displaying the raw data of post dates. Without joining this data with other information, we won’t gain a better understanding of how the blog is performing and why it performs the way it does.\nAn attempt to extract more meaningful insights from posting times appears further down the page, in the form of a widget that tells you the most popular day and hour. The help text says that This is the day and hour when you have been getting the most Views on average. The best timing for publishing a post may be around this period. Unfortunately, I’m pretty certain that this isn’t true in my case. Monday happens to be the most popular day because that’s when I published two of my most popular posts, and I usually try to spread the word about a new post as soon as I publish it. Further, blog posts can become popular a long time after publication, so it is unlikely that the best timing for publishing a post is around Monday 3pm.\nWhat would real WordPress insights look like? If we stick to idea of exploring the effect of publication timing, I would be curious to know if there is indeed a link between when a post is published and its popularity. Automattic (the company behind WordPress) is in a position to test this, as they can explore data from millions of blogs. My gut feeling is that the time of publication has a negligible effect on popularity. Things that matter much more are a post’s title, content, and effective distribution channels. Given the amount of data that they have, Automattic data scientists can definitely explore all of these factors. This would allow them to surface insights that will help authors drive more quality traffic to their websites.\nExhibit B: Facebook page insights As anyone who manages a Facebook page probably knows, Facebook provides pretty rich analytics of pages on their platform. For example, you can see the likes you’ve received over time and how your posts perform, and slice and dice this information in various ways. This is a great feature, but again, calling it insights is a misuse of the word and somewhat of an insult for those of us who work to extract real insights from data. An analytics dashboard is not insights.\nWhat would real Facebook page insights look like? Working off the assumption that people manage a Facebook page to reach and engage their audience, real insights would enhance a page administrator’s understanding of their audience and improve their ability to engage them and reach new people. However, Facebook is famous for having a conflict of interest here, because they require you to pay to reach more people. For example, if a post you shared is performing better than usual, Facebook will send you a notification, asking you to pay to boost the post further. It would be better if they told you what has caused this post to reach more people, and how to reproduce this success with future posts (for free). But this is very unlikely to happen. In the words of CGP Grey: professional sharers cannot trust the platforms upon which they stand, audiences cannot trust the platform to show what they asked to see.\nExhibit C: LinkedIn profile views Who’s viewed your profile is a popular LinkedIn feature. A key part of this feature is a graph that includes your weekly profile views together with actions taken on LinkedIn. The official LinkedIn blog calls this graph the insights graph and provides some examples for its uses:\nSo, for example, if you are trying to attract new clients or business leads, you can see how many potential partners looked at your profile after you joined an important industry group. Or, if you’re looking for a new job, you can look at your insights graph to see whether adding a skill to your profile or endorsing a peer gave you a bigger bump in views by recruiters. No matter your goal, you’ll be able to see which actions lead to the most relevant profile views – then start reaching out and closing the sale or applying for your dream job.\nAs the examples show, the so-called insights graph merely provides information about past actions and profile views on the LinkedIn platform. It is up to you to come up with the insights, but this may be hard if you consider only the actions taken within the walled garden of LinkedIn. For example, as shown in the following graph, my profile views received a boost on the week starting November 23, which was mostly due to publishing a popular post on this website. In general, social networks such as LinkedIn, Twitter, and Facebook tend to have a very narrow view of the world – as if the only interesting things happen on the platform. In reality, most of the action happens off-platform, either within other digital assets or in the physical world.\nWhat would real LinkedIn insights look like? First, I think that the focus on profile views is somewhat misguided. It’s not that hard to artificially generate profile views – simply view other people’s profiles. There is no intrinsic value in someone having viewed your profile – the value comes from a connection that leads to an interesting offer or conversation. Second, LinkedIn is about professional networking that is based on real-world activity. As such, it only forms a small part of the world of professional networking by allowing people to have an online presence that makes them contactable by people they don’t already know. When it comes to insights, it’d be useful to know the true causal factors that lead to interesting connections – much more useful than suggestions such as add software development as a skill on your profile to get up to 3% more profile views.\nSummary: Real insights are about the why There are many other examples of pseudo-insights out there. The reason is probably that the field of analytics is becoming increasingly commoditised, and it is easier to rebrand an analytics dashboard as an insights dashboard than to provide real insights. Providing real insights requires moving up the DIKW pyramid from data and information to knowledge and wisdom – from describing the past to learning general lessons that allow you to influence the future. Providing real insights can be very hard, as it often requires inferring the causes of events – the why that comes after the what and how. More on this later – I have just started reading Samantha Kleinberg’s Why: A Guide to Finding and Using Causes and will report (hopefully real) insights on causality in future posts.\n","wordCount":"1296","inLanguage":"en","image":"https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/dikw-pyramid.jpg","datePublished":"2015-12-08T06:57:25Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">This holiday season, give me real insights</h1><div class=post-meta><span title='2015-12-08 06:57:25 +0000 UTC'>December 8, 2015</span></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/dikw-pyramid.jpg alt></figure><div class=post-content><p>Merriam-Webster defines an <a href=http://www.merriam-webster.com/dictionary/insight target=_blank rel=noopener>insight</a> as <em>an understanding of the true nature of something</em>. Many companies seem to define an insight as <em>any piece of data or information</em>, which I would call a pseudo-insight. This post surveys some examples of pseudo-insights, and discusses how these can be built upon to provide real insights.</p><h2 id=exhibit-a-wordpress-stats>Exhibit A: WordPress stats<a hidden class=anchor aria-hidden=true href=#exhibit-a-wordpress-stats>#</a></h2><p>This website is hosted on <a href=http://wordpress.com target=_blank rel=noopener>wordpress.com</a>. I&rsquo;m generally happy with WordPress – though it&rsquo;s not as exciting and shiny as newer competitors, it is rock-solid and very feature-rich. An example of a great WordPress feature is the new stats area (available under <a href=https://wordpress.com/stats target=_blank rel=noopener>wordpress.com/stats</a> if you have a WordPress website). This area includes an insights page, which is full of prime examples of pseudo-insights.</p><p>At the top of the insights page, there is a visualisation of posting activity. As the image below shows, this isn&rsquo;t very interesting for websites like mine. I already know that I post irregularly, because writing a blog post is time-consuming. I suspect that this visualisation isn&rsquo;t very useful even for more active multi-author blogs, as it is essentially just a different way of displaying the raw data of post dates. Without joining this data with other information, we won&rsquo;t gain a better understanding of how the blog is performing and why it performs the way it does.</p><figure><a href=wordpress-insights-posting-activity.png target=_blank rel=noopener><img sizes="(min-width: 768px) 713px,
 100vw" srcset="https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/wordpress-insights-posting-activity_huf00e3c8956018a008091108b9751156a_6414_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/wordpress-insights-posting-activity_huf00e3c8956018a008091108b9751156a_6414_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/wordpress-insights-posting-activity.png 713w," src=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/wordpress-insights-posting-activity.png alt="WordPress insights: posting activity" loading=lazy></a></figure><p>An attempt to extract more meaningful insights from posting times appears further down the page, in the form of a widget that tells you the most popular day and hour. The help text says that <em>This is the day and hour when you have been getting the most Views on average. The best timing for publishing a post may be around this period</em>. Unfortunately, I&rsquo;m pretty certain that this isn&rsquo;t true in my case. Monday happens to be the most popular day because that&rsquo;s when I published two of my most popular posts, and I usually try to spread the word about a new post as soon as I publish it. Further, blog posts can become popular a long time after publication, so it is unlikely that the best timing for publishing a post is around Monday 3pm.</p><figure><a href=wordpress-insights-popular-time.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
diff --git a/2016/01/24/the-joys-of-offline-data-collection/index.html b/2016/01/24/the-joys-of-offline-data-collection/index.html
index 57f2be0ae..7ba598c73 100644
--- a/2016/01/24/the-joys-of-offline-data-collection/index.html
+++ b/2016/01/24/the-joys-of-offline-data-collection/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>The joys of offline data collection | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data science,deep learning,environment,marine science,personal,predictive modelling,Reef Life Survey,scuba diving"><meta name=description content="Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The joys of offline data collection"><meta property="og:description" content="Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/"><meta property="og:image" content="https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2016-01-24T00:32:25+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark.jpg"><meta name=twitter:title content="The joys of offline data collection"><meta name=twitter:description content="Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"The joys of offline data collection","item":"https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The joys of offline data collection","name":"The joys of offline data collection","description":"Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.","keywords":["data science","deep learning","environment","marine science","personal","predictive modelling","Reef Life Survey","scuba diving"],"articleBody":"Many modern data scientists don’t get to experience data collection in the offline world. Recently, I spent a month sailing down the northern Great Barrier Reef, collecting data for the Reef Life Survey project. In addition to being a great diving experience, the trip helped me obtain general insights on data collection and machine learning, which are shared in this article.\nThe Reef Life Survey project Reef Life Survey (RLS) is a citizen scientist project, led by a team from the University of Tasmania. The data collected by RLS volunteers is freely available on the RLS website, and has been used for producing various reports and scientific publications. An RLS survey is performed along a 50 metre tape, which is laid at a constant depth following a reef’s contour. After laying the tape, one diver takes photos of the bottom at 2.5 metre intervals along the transect line. These photos are automatically analysed to classify the type of substrate or growth (e.g., hard coral or sand). Divers then complete two swims along each side of the transect. On the first swim (method 1), divers record all the fish species and large swimming animals found in a 5 metre corridor from the line. The second swim (method 2) requires keeping closer to the bottom and looking under ledges and vegetation in a 1 metre corridor from the line, targeting invertebrates and cryptic animals. The RLS manual includes all the details on how surveys are performed.\nPerforming RLS surveys is not a trivial task. In the tropics, it is not uncommon to record around 100 fish species on method 1. The scientists running the project are very conscious of the importance of obtaining high-quality data, so training to become an RLS volunteer takes considerable effort and dedication. The process generally consists of doing surveys together with an experienced RLS diver, and comparing the data after each dive. Once the trainee’s data matches that of the experienced RLSer, they are considered good enough to perform surveys independently. However, retraining is often required when surveying new ecoregions (e.g., an RLSer trained in Sydney needs further training to survey the Great Barrier Reef).\nRLS requires a lot of hard work, but there are many reasons why it’s worth the effort. As someone who cares about marine conservation, I like the fact that RLS dives yield useful data that is used to drive environmental management decisions. As a scuba diver, I enjoy the opportunity to dive places that are rarely dived and the enhanced knowledge of the marine environment – doing surveys makes me notice things that I would otherwise overlook. Finally, as a data scientist, I find the exposure to the work of marine scientists very educational.\nPre-training and thoughts on supervised learning Doing surveys in the tropics is a completely different story from surveying temperate reefs, due to the substantially higher diversity and abundance of marine creatures. Producing high-quality results requires being able to identify most creatures underwater, while doing the survey. It is possible to write down descriptions and take photos of unidentified species, but doing this for a large number of species is impractical.\nTraining the neural network in my head to classify tropical fish by species was an interesting experience. The approach that worked best was making flashcards using reveal.js, photos scraped from various sources, and past survey data. As the image below shows, each flashcard consists of a single photo, and pressing the down arrow reveals the name of the creature. With some basic JavaScript, I made the presentation select a different subset of photos on each load. Originally, I tried to learn all the 1000+ species that were previously recorded in the northern Great Barrier Reef, but this proved to be too hard – I realised that a better strategy was needed. The strategy that I chose was to focus on the most frequently-recorded species: I started by memorising the most frequent ones (e.g., those recorded on more than 50% of surveys), and gradually made it more challenging by decreasing the frequency threshold (e.g., to 25% in 5% steps). This proved to be pretty effective – by the time I started diving I could identify about 50-100 species underwater, even though I had mostly been using static images. It’d be interesting to know whether this kind of approach would be effective in training neural networks (or other batch-trained models) in certain scenarios – spend a few epochs training with instances from a subset of the classes, and gradually increase the number of considered classes. This may be effective when errors on certain classes are more important than others, and may yield different results from simply weighting classes or instances. Please let me know if you know of anyone who has experimented with this idea (update: gwern from Reddit pointed me to the paper Curriculum Learning by Bengio et al., which discusses this idea).\nRLS flashcard example (Chaetodon lunulatus) While repeatedly looking at photos and their labels felt a lot like training an artificial neural network, as a human I have the advantage of being able to easily use information from multiple sources. For example, fish ID books such as Reef Fish Identification: Tropical Pacific provide concise descriptions of the identifying physical features of each fish (see the image below for the book’s entry for Chaetodon lunulatus – the butterflyfish from the flashcard above). Reading those descriptions made me learn more effectively, by helping me focus my attention on the parts that matter for classification. Learning only from static images can be hard when classifying creatures with highly variable colour schemes – using extraneous knowledge about what actually matters when it comes to classification is the way to go in practice. Further, features that are hard to decode from photos – like behaviour and habitat – are sometimes crucial to distinguishing different species. One interesting thought is that while photos can be seen as raw data, natural language descriptions are essentially models. Utilising such models is likely to be of benefit in many areas. For example, being able to tell a classifier what to look for in an image would make training a supervised classifier more similar to the way humans learn. This may be achieved using similar techniques to those used for generating image descriptions, except that the goal would be to use descriptions of the classes to improve classification accuracy.\nFish ID example (Chaetodon lunulatus). Source: Reef Fish Identification: Tropical Pacific Another difference between my learning and supervised machine learning is that if I found a creature hard to identify, I would go and look for more photos or videos of them. Videos were especially valuable, because in practice I rarely had to identify static creatures. This approach may be applicable in situations where labelled data is abundant. Sometimes, using all the labelled data makes model training too slow to be practical. An approach I used in the past to overcome this issue is to randomly sample the data, but it often makes sense to sample in a way that yields the best model, e.g., by sampling more instances from classes that are harder to classify.\nOne similarity to supervised machine learning that I encountered was the danger of overfitting. Due to the relatively small number of photos and the fact that I had to view each one of them multiple times, I found that in some cases I memorised the entire photo rather than the creature. This was especially the case with low-quality photos or ones that were missing key features. My regularisation approach consisted of trying to memorise the descriptions from the book, and collecting more photos. I wish more algorithms were this self-conscious about overfitting!\nCan’t this be automated? While doing surveys and studying species, I kept asking myself whether the whole thing can be automated. Thanks to deep learning, computers have recently gotten very good at classifying images, sometimes outperforming humans. It seems likely that at some point the survey methodology would be changed to just taking a video of the dive, and letting an algorithm do the hard job of identifying the creatures. Analysis of the bottom photos is automated, so it is reasonable to automate the other survey methods as well. However, there are quite a few challenges that need to be overcome before full automation can be implemented.\nIf the results of the LifeCLEF 2015 Fish Task are any indication, we are quite far from automating fish identification. The precision of the top methods in that challenge was around 80% for identifying 15 fish species from underwater videos, where the chosen species are quite distinct from each other. In tropical surveys it is not uncommon to record around 100 fish species along the 50 metre transect, with many species being similar to each other. It’s usually the case that it’s not same species on every dive (even at the same site), so replacing humans would require training a highly accurate classifier on thousands of species.\nDealing with high diversity isn’t the only challenge in automating RLS. The appearance of many species varies by gender and age, so the classifier would have to learn all those variations (see image below for an example). Getting good training data can be very challenging, since the labelling process is labour-intensive, and elements like colour and backscatter are highly dependent on dive site conditions and the quality of the camera. Another complication is that RLS data includes size estimates, which can be hard to obtain from videos and photos without knowing how far the camera was from the subject and the type of lens used. In addition, accounting for side information (geolocation, behaviour, depth, etc.) can make a huge difference in accurately identifying species, but it isn’t easy to integrate with some learning models. Finally, it is likely that some species will be missed when videos are taken without any identification done underwater, because RLSers tend to get good photos of species that they know will be hard to identify, even if it means spending more time at one spot or shining strobes under ledges.\nChlorurus sordidus variations. Source: Tropical Marine Fishes of Australia Another aspect of automating surveys is completely removing the need for human divers by sending robots down. This is an active research area, and is the only way of surveying deep waters. However, this approach still requires a boat-based crew to deploy the robots. It may also yield different data from RLS for cryptic species, though this depends on the type of robots used. In addition, there’s the issue of cost – RLS relies on volunteer scuba divers who are diving anyway, so the cost of getting RLSers to do surveys is rather low (especially for shore dives near a diver’s home, where there is no cost to RLS). Further, RLS’s mission is “to inspire and engage a global volunteer community to survey reefs using scientific methods and share knowledge about marine ecosystem health”. Engaging the community is a crucial part of RLS because robots do not care about the environment. Humans do.\nSmall data is valuable When compared to datasets commonly encountered online, RLS data is small. As the image below shows, fewer than 10,000 surveys have been conducted to date. However, this data is still valuable, as it provides a high-quality snapshot of the state of marine ecosystems in areas that wouldn’t be surveyed if it wasn’t for RLS volunteers. For example, in a recent Nature article, the authors used RLS data to assess the vulnerability of marine fauna to global warming.\nRLS surveys by Australian financial year (July-June). Source: RLS Foundation Annual Report 2015 Each RLS survey requires several hours of work. In addition to performing the survey itself, a lot of work goes into entering the data and verifying its quality. Getting to the survey sites is not always a trivial task, especially for remote sites such as some of those we dived on my recent trip. Spending a month diving the Great Barrier Reef is a good way of appreciating its greatness. As the map shows, the surveys we did covered only the top part of the reef’s 2300 kilometres, and we only sampled a few sites within that part. The Great Barrier Reef is very vast, and it is hard to convey its vastness with just words or a map. You have to be there to understand – it is quite humbling.\nIn summary, the RLS experience has given me a new appreciation for small data in the offline world. Offline data collection is often expensive and labour-intensive – you need to work hard to produce a few high-quality data points. But the size of your data doesn’t matter (though having more quality data is always good). What really matters is what you do with the data – and the RLS team and their collaborators have been doing quite a lot. The RLS experience also illustrates the importance of domain expertise: I’ve looked at the RLS datasets, but I have no idea what questions are worth asking and answering using those datasets. The RLS project is yet another example of how in science collecting data is time-consuming, and coming up with appropriate research questions is hard. It is a lot of fun, though.\n","wordCount":"2207","inLanguage":"en","image":"https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark.jpg","datePublished":"2016-01-24T00:32:25Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">The joys of offline data collection</h1><div class=post-meta><span title='2016-01-24 00:32:25 +0000 UTC'>January 24, 2016</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark_hu5b48ea845b0512937c3ac1259641b3e3_859311_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark_hu5b48ea845b0512937c3ac1259641b3e3_859311_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark_hu5b48ea845b0512937c3ac1259641b3e3_859311_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark_hu5b48ea845b0512937c3ac1259641b3e3_859311_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark_hu5b48ea845b0512937c3ac1259641b3e3_859311_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark.jpg 3220w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark.jpg alt width=3220 height=1310></figure><div class=post-content><p>Many modern data scientists don&rsquo;t get to experience data collection in the offline world. Recently, I spent a month sailing down the northern Great Barrier Reef, collecting data for the <a href=http://reeflifesurvey.com/ target=_blank rel=noopener>Reef Life Survey project</a>. In addition to being a great diving experience, the trip helped me obtain general insights on data collection and machine learning, which are shared in this article.</p><h2 id=the-reef-life-survey-project>The Reef Life Survey project<a hidden class=anchor aria-hidden=true href=#the-reef-life-survey-project>#</a></h2><p>Reef Life Survey (RLS) is a citizen scientist project, led by a team from the University of Tasmania. The <a href=http://reeflifesurvey.com/reef-life-survey/survey-data/ target=_blank rel=noopener>data collected by RLS volunteers is freely available on the RLS website</a>, and has been used for producing <a href=http://reeflifesurvey.com/scientific-papers/ target=_blank rel=noopener>various reports and scientific publications</a>. An RLS survey is performed along a 50 metre tape, which is laid at a constant depth following a reef&rsquo;s contour. After laying the tape, one diver takes photos of the bottom at 2.5 metre intervals along the transect line. These photos are automatically analysed to <a href=https://drive.google.com/file/d/0B9XQg8_HWQVPU2NweEFmcEJYQTQ/view target=_blank rel=noopener>classify the type of substrate or growth</a> (e.g., hard coral or sand). Divers then complete two swims along each side of the transect. On the first swim (method 1), divers record <strong>all</strong> the fish species and large swimming animals found in a 5 metre corridor from the line. The second swim (method 2) requires keeping closer to the bottom and looking under ledges and vegetation in a 1 metre corridor from the line, targeting invertebrates and cryptic animals. The <a href=http://reeflifesurvey.com/wp-content/uploads/2015/07/NEW-Methods-Manual_150815.pdf target=_blank rel=noopener>RLS manual</a> includes all the details on how surveys are performed.</p><p>Performing RLS surveys is not a trivial task. In the tropics, it is not uncommon to record around 100 fish species on method 1. The scientists running the project are very conscious of the importance of obtaining high-quality data, so training to become an RLS volunteer takes considerable effort and dedication. The process generally consists of doing surveys together with an experienced RLS diver, and comparing the data after each dive. Once the trainee&rsquo;s data matches that of the experienced RLSer, they are considered good enough to perform surveys independently. However, retraining is often required when surveying new ecoregions (e.g., an RLSer trained in Sydney needs further training to survey the Great Barrier Reef).</p><p>RLS requires a lot of hard work, but there are many reasons why it&rsquo;s worth the effort. As someone who cares about marine conservation, I like the fact that RLS dives yield useful data that is used to drive environmental management decisions. As a scuba diver, I enjoy the opportunity to dive places that are rarely dived and the enhanced knowledge of the marine environment – doing surveys makes me notice things that I would otherwise overlook. Finally, as a data scientist, I find the exposure to the work of marine scientists very educational.</p><h2 id=pre-training-and-thoughts-on-supervised-learning>Pre-training and thoughts on supervised learning<a hidden class=anchor aria-hidden=true href=#pre-training-and-thoughts-on-supervised-learning>#</a></h2><p>Doing surveys in the tropics is a completely different story from surveying temperate reefs, due to the substantially higher diversity and abundance of marine creatures. Producing high-quality results requires being able to identify most creatures underwater, while doing the survey. It is possible to write down descriptions and take photos of unidentified species, but doing this for a large number of species is impractical.</p><p>Training the neural network in my head to classify tropical fish by species was an interesting experience. The approach that worked best was making flashcards using <a href=http://lab.hakim.se/reveal-js/ target=_blank rel=noopener>reveal.js</a>, photos scraped from various sources, and past survey data. As the image below shows, each flashcard consists of a single photo, and pressing the down arrow reveals the name of the creature. With some basic JavaScript, I made the presentation select a different subset of photos on each load. Originally, I tried to learn all the 1000+ species that were previously recorded in the northern Great Barrier Reef, but this proved to be too hard – I realised that a better strategy was needed. The strategy that I chose was to focus on the most frequently-recorded species: I started by memorising the most frequent ones (e.g., those recorded on more than 50% of surveys), and gradually made it more challenging by decreasing the frequency threshold (e.g., to 25% in 5% steps). This proved to be pretty effective – by the time I started diving I could identify about 50-100 species underwater, even though I had mostly been using static images. It&rsquo;d be interesting to know whether this kind of approach would be effective in training neural networks (or other batch-trained models) in certain scenarios – spend a few epochs training with instances from a subset of the classes, and gradually increase the number of considered classes. This may be effective when errors on certain classes are more important than others, and may yield different results from simply weighting classes or instances. Please <a href=https://yanirseroussi.com/about/>let me know</a> if you know of anyone who has experimented with this idea (<strong>update:</strong> <a href=https://www.reddit.com/r/MachineLearning/comments/42dp7l/the_joys_of_offline_data_collection_including/cz9jqev target=_blank rel=noopener>gwern from Reddit</a> pointed me to the paper <a href=http://ronan.collobert.com/pub/matos/2009_curriculum_icml.pdf target=_blank rel=noopener>Curriculum Learning</a> by Bengio et al., which discusses this idea).</p><figure><a href=rls-flashcard.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="data science,deep learning,environment,marine science,personal,predictive modelling,Reef Life Survey,scuba diving"><meta name=description content="Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The joys of offline data collection"><meta property="og:description" content="Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/"><meta property="og:image" content="https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2016-01-24T00:32:25+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark.jpg"><meta name=twitter:title content="The joys of offline data collection"><meta name=twitter:description content="Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"The joys of offline data collection","item":"https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The joys of offline data collection","name":"The joys of offline data collection","description":"Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.","keywords":["data science","deep learning","environment","marine science","personal","predictive modelling","Reef Life Survey","scuba diving"],"articleBody":"Many modern data scientists don’t get to experience data collection in the offline world. Recently, I spent a month sailing down the northern Great Barrier Reef, collecting data for the Reef Life Survey project. In addition to being a great diving experience, the trip helped me obtain general insights on data collection and machine learning, which are shared in this article.\nThe Reef Life Survey project Reef Life Survey (RLS) is a citizen scientist project, led by a team from the University of Tasmania. The data collected by RLS volunteers is freely available on the RLS website, and has been used for producing various reports and scientific publications. An RLS survey is performed along a 50 metre tape, which is laid at a constant depth following a reef’s contour. After laying the tape, one diver takes photos of the bottom at 2.5 metre intervals along the transect line. These photos are automatically analysed to classify the type of substrate or growth (e.g., hard coral or sand). Divers then complete two swims along each side of the transect. On the first swim (method 1), divers record all the fish species and large swimming animals found in a 5 metre corridor from the line. The second swim (method 2) requires keeping closer to the bottom and looking under ledges and vegetation in a 1 metre corridor from the line, targeting invertebrates and cryptic animals. The RLS manual includes all the details on how surveys are performed.\nPerforming RLS surveys is not a trivial task. In the tropics, it is not uncommon to record around 100 fish species on method 1. The scientists running the project are very conscious of the importance of obtaining high-quality data, so training to become an RLS volunteer takes considerable effort and dedication. The process generally consists of doing surveys together with an experienced RLS diver, and comparing the data after each dive. Once the trainee’s data matches that of the experienced RLSer, they are considered good enough to perform surveys independently. However, retraining is often required when surveying new ecoregions (e.g., an RLSer trained in Sydney needs further training to survey the Great Barrier Reef).\nRLS requires a lot of hard work, but there are many reasons why it’s worth the effort. As someone who cares about marine conservation, I like the fact that RLS dives yield useful data that is used to drive environmental management decisions. As a scuba diver, I enjoy the opportunity to dive places that are rarely dived and the enhanced knowledge of the marine environment – doing surveys makes me notice things that I would otherwise overlook. Finally, as a data scientist, I find the exposure to the work of marine scientists very educational.\nPre-training and thoughts on supervised learning Doing surveys in the tropics is a completely different story from surveying temperate reefs, due to the substantially higher diversity and abundance of marine creatures. Producing high-quality results requires being able to identify most creatures underwater, while doing the survey. It is possible to write down descriptions and take photos of unidentified species, but doing this for a large number of species is impractical.\nTraining the neural network in my head to classify tropical fish by species was an interesting experience. The approach that worked best was making flashcards using reveal.js, photos scraped from various sources, and past survey data. As the image below shows, each flashcard consists of a single photo, and pressing the down arrow reveals the name of the creature. With some basic JavaScript, I made the presentation select a different subset of photos on each load. Originally, I tried to learn all the 1000+ species that were previously recorded in the northern Great Barrier Reef, but this proved to be too hard – I realised that a better strategy was needed. The strategy that I chose was to focus on the most frequently-recorded species: I started by memorising the most frequent ones (e.g., those recorded on more than 50% of surveys), and gradually made it more challenging by decreasing the frequency threshold (e.g., to 25% in 5% steps). This proved to be pretty effective – by the time I started diving I could identify about 50-100 species underwater, even though I had mostly been using static images. It’d be interesting to know whether this kind of approach would be effective in training neural networks (or other batch-trained models) in certain scenarios – spend a few epochs training with instances from a subset of the classes, and gradually increase the number of considered classes. This may be effective when errors on certain classes are more important than others, and may yield different results from simply weighting classes or instances. Please let me know if you know of anyone who has experimented with this idea (update: gwern from Reddit pointed me to the paper Curriculum Learning by Bengio et al., which discusses this idea).\nRLS flashcard example (Chaetodon lunulatus) While repeatedly looking at photos and their labels felt a lot like training an artificial neural network, as a human I have the advantage of being able to easily use information from multiple sources. For example, fish ID books such as Reef Fish Identification: Tropical Pacific provide concise descriptions of the identifying physical features of each fish (see the image below for the book’s entry for Chaetodon lunulatus – the butterflyfish from the flashcard above). Reading those descriptions made me learn more effectively, by helping me focus my attention on the parts that matter for classification. Learning only from static images can be hard when classifying creatures with highly variable colour schemes – using extraneous knowledge about what actually matters when it comes to classification is the way to go in practice. Further, features that are hard to decode from photos – like behaviour and habitat – are sometimes crucial to distinguishing different species. One interesting thought is that while photos can be seen as raw data, natural language descriptions are essentially models. Utilising such models is likely to be of benefit in many areas. For example, being able to tell a classifier what to look for in an image would make training a supervised classifier more similar to the way humans learn. This may be achieved using similar techniques to those used for generating image descriptions, except that the goal would be to use descriptions of the classes to improve classification accuracy.\nFish ID example (Chaetodon lunulatus). Source: Reef Fish Identification: Tropical Pacific Another difference between my learning and supervised machine learning is that if I found a creature hard to identify, I would go and look for more photos or videos of them. Videos were especially valuable, because in practice I rarely had to identify static creatures. This approach may be applicable in situations where labelled data is abundant. Sometimes, using all the labelled data makes model training too slow to be practical. An approach I used in the past to overcome this issue is to randomly sample the data, but it often makes sense to sample in a way that yields the best model, e.g., by sampling more instances from classes that are harder to classify.\nOne similarity to supervised machine learning that I encountered was the danger of overfitting. Due to the relatively small number of photos and the fact that I had to view each one of them multiple times, I found that in some cases I memorised the entire photo rather than the creature. This was especially the case with low-quality photos or ones that were missing key features. My regularisation approach consisted of trying to memorise the descriptions from the book, and collecting more photos. I wish more algorithms were this self-conscious about overfitting!\nCan’t this be automated? While doing surveys and studying species, I kept asking myself whether the whole thing can be automated. Thanks to deep learning, computers have recently gotten very good at classifying images, sometimes outperforming humans. It seems likely that at some point the survey methodology would be changed to just taking a video of the dive, and letting an algorithm do the hard job of identifying the creatures. Analysis of the bottom photos is automated, so it is reasonable to automate the other survey methods as well. However, there are quite a few challenges that need to be overcome before full automation can be implemented.\nIf the results of the LifeCLEF 2015 Fish Task are any indication, we are quite far from automating fish identification. The precision of the top methods in that challenge was around 80% for identifying 15 fish species from underwater videos, where the chosen species are quite distinct from each other. In tropical surveys it is not uncommon to record around 100 fish species along the 50 metre transect, with many species being similar to each other. It’s usually the case that it’s not same species on every dive (even at the same site), so replacing humans would require training a highly accurate classifier on thousands of species.\nDealing with high diversity isn’t the only challenge in automating RLS. The appearance of many species varies by gender and age, so the classifier would have to learn all those variations (see image below for an example). Getting good training data can be very challenging, since the labelling process is labour-intensive, and elements like colour and backscatter are highly dependent on dive site conditions and the quality of the camera. Another complication is that RLS data includes size estimates, which can be hard to obtain from videos and photos without knowing how far the camera was from the subject and the type of lens used. In addition, accounting for side information (geolocation, behaviour, depth, etc.) can make a huge difference in accurately identifying species, but it isn’t easy to integrate with some learning models. Finally, it is likely that some species will be missed when videos are taken without any identification done underwater, because RLSers tend to get good photos of species that they know will be hard to identify, even if it means spending more time at one spot or shining strobes under ledges.\nChlorurus sordidus variations. Source: Tropical Marine Fishes of Australia Another aspect of automating surveys is completely removing the need for human divers by sending robots down. This is an active research area, and is the only way of surveying deep waters. However, this approach still requires a boat-based crew to deploy the robots. It may also yield different data from RLS for cryptic species, though this depends on the type of robots used. In addition, there’s the issue of cost – RLS relies on volunteer scuba divers who are diving anyway, so the cost of getting RLSers to do surveys is rather low (especially for shore dives near a diver’s home, where there is no cost to RLS). Further, RLS’s mission is “to inspire and engage a global volunteer community to survey reefs using scientific methods and share knowledge about marine ecosystem health”. Engaging the community is a crucial part of RLS because robots do not care about the environment. Humans do.\nSmall data is valuable When compared to datasets commonly encountered online, RLS data is small. As the image below shows, fewer than 10,000 surveys have been conducted to date. However, this data is still valuable, as it provides a high-quality snapshot of the state of marine ecosystems in areas that wouldn’t be surveyed if it wasn’t for RLS volunteers. For example, in a recent Nature article, the authors used RLS data to assess the vulnerability of marine fauna to global warming.\nRLS surveys by Australian financial year (July-June). Source: RLS Foundation Annual Report 2015 Each RLS survey requires several hours of work. In addition to performing the survey itself, a lot of work goes into entering the data and verifying its quality. Getting to the survey sites is not always a trivial task, especially for remote sites such as some of those we dived on my recent trip. Spending a month diving the Great Barrier Reef is a good way of appreciating its greatness. As the map shows, the surveys we did covered only the top part of the reef’s 2300 kilometres, and we only sampled a few sites within that part. The Great Barrier Reef is very vast, and it is hard to convey its vastness with just words or a map. You have to be there to understand – it is quite humbling.\nIn summary, the RLS experience has given me a new appreciation for small data in the offline world. Offline data collection is often expensive and labour-intensive – you need to work hard to produce a few high-quality data points. But the size of your data doesn’t matter (though having more quality data is always good). What really matters is what you do with the data – and the RLS team and their collaborators have been doing quite a lot. The RLS experience also illustrates the importance of domain expertise: I’ve looked at the RLS datasets, but I have no idea what questions are worth asking and answering using those datasets. The RLS project is yet another example of how in science collecting data is time-consuming, and coming up with appropriate research questions is hard. It is a lot of fun, though.\n","wordCount":"2207","inLanguage":"en","image":"https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark.jpg","datePublished":"2016-01-24T00:32:25Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">The joys of offline data collection</h1><div class=post-meta><span title='2016-01-24 00:32:25 +0000 UTC'>January 24, 2016</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark_hu5b48ea845b0512937c3ac1259641b3e3_859311_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark_hu5b48ea845b0512937c3ac1259641b3e3_859311_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark_hu5b48ea845b0512937c3ac1259641b3e3_859311_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark_hu5b48ea845b0512937c3ac1259641b3e3_859311_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark_hu5b48ea845b0512937c3ac1259641b3e3_859311_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark.jpg 3220w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/triaenodon-obesus-whitetip-reef-shark.jpg alt width=3220 height=1310></figure><div class=post-content><p>Many modern data scientists don&rsquo;t get to experience data collection in the offline world. Recently, I spent a month sailing down the northern Great Barrier Reef, collecting data for the <a href=http://reeflifesurvey.com/ target=_blank rel=noopener>Reef Life Survey project</a>. In addition to being a great diving experience, the trip helped me obtain general insights on data collection and machine learning, which are shared in this article.</p><h2 id=the-reef-life-survey-project>The Reef Life Survey project<a hidden class=anchor aria-hidden=true href=#the-reef-life-survey-project>#</a></h2><p>Reef Life Survey (RLS) is a citizen scientist project, led by a team from the University of Tasmania. The <a href=http://reeflifesurvey.com/reef-life-survey/survey-data/ target=_blank rel=noopener>data collected by RLS volunteers is freely available on the RLS website</a>, and has been used for producing <a href=http://reeflifesurvey.com/scientific-papers/ target=_blank rel=noopener>various reports and scientific publications</a>. An RLS survey is performed along a 50 metre tape, which is laid at a constant depth following a reef&rsquo;s contour. After laying the tape, one diver takes photos of the bottom at 2.5 metre intervals along the transect line. These photos are automatically analysed to <a href=https://drive.google.com/file/d/0B9XQg8_HWQVPU2NweEFmcEJYQTQ/view target=_blank rel=noopener>classify the type of substrate or growth</a> (e.g., hard coral or sand). Divers then complete two swims along each side of the transect. On the first swim (method 1), divers record <strong>all</strong> the fish species and large swimming animals found in a 5 metre corridor from the line. The second swim (method 2) requires keeping closer to the bottom and looking under ledges and vegetation in a 1 metre corridor from the line, targeting invertebrates and cryptic animals. The <a href=http://reeflifesurvey.com/wp-content/uploads/2015/07/NEW-Methods-Manual_150815.pdf target=_blank rel=noopener>RLS manual</a> includes all the details on how surveys are performed.</p><p>Performing RLS surveys is not a trivial task. In the tropics, it is not uncommon to record around 100 fish species on method 1. The scientists running the project are very conscious of the importance of obtaining high-quality data, so training to become an RLS volunteer takes considerable effort and dedication. The process generally consists of doing surveys together with an experienced RLS diver, and comparing the data after each dive. Once the trainee&rsquo;s data matches that of the experienced RLSer, they are considered good enough to perform surveys independently. However, retraining is often required when surveying new ecoregions (e.g., an RLSer trained in Sydney needs further training to survey the Great Barrier Reef).</p><p>RLS requires a lot of hard work, but there are many reasons why it&rsquo;s worth the effort. As someone who cares about marine conservation, I like the fact that RLS dives yield useful data that is used to drive environmental management decisions. As a scuba diver, I enjoy the opportunity to dive places that are rarely dived and the enhanced knowledge of the marine environment – doing surveys makes me notice things that I would otherwise overlook. Finally, as a data scientist, I find the exposure to the work of marine scientists very educational.</p><h2 id=pre-training-and-thoughts-on-supervised-learning>Pre-training and thoughts on supervised learning<a hidden class=anchor aria-hidden=true href=#pre-training-and-thoughts-on-supervised-learning>#</a></h2><p>Doing surveys in the tropics is a completely different story from surveying temperate reefs, due to the substantially higher diversity and abundance of marine creatures. Producing high-quality results requires being able to identify most creatures underwater, while doing the survey. It is possible to write down descriptions and take photos of unidentified species, but doing this for a large number of species is impractical.</p><p>Training the neural network in my head to classify tropical fish by species was an interesting experience. The approach that worked best was making flashcards using <a href=http://lab.hakim.se/reveal-js/ target=_blank rel=noopener>reveal.js</a>, photos scraped from various sources, and past survey data. As the image below shows, each flashcard consists of a single photo, and pressing the down arrow reveals the name of the creature. With some basic JavaScript, I made the presentation select a different subset of photos on each load. Originally, I tried to learn all the 1000+ species that were previously recorded in the northern Great Barrier Reef, but this proved to be too hard – I realised that a better strategy was needed. The strategy that I chose was to focus on the most frequently-recorded species: I started by memorising the most frequent ones (e.g., those recorded on more than 50% of surveys), and gradually made it more challenging by decreasing the frequency threshold (e.g., to 25% in 5% steps). This proved to be pretty effective – by the time I started diving I could identify about 50-100 species underwater, even though I had mostly been using static images. It&rsquo;d be interesting to know whether this kind of approach would be effective in training neural networks (or other batch-trained models) in certain scenarios – spend a few epochs training with instances from a subset of the classes, and gradually increase the number of considered classes. This may be effective when errors on certain classes are more important than others, and may yield different results from simply weighting classes or instances. Please <a href=https://yanirseroussi.com/about/>let me know</a> if you know of anyone who has experimented with this idea (<strong>update:</strong> <a href=https://www.reddit.com/r/MachineLearning/comments/42dp7l/the_joys_of_offline_data_collection_including/cz9jqev target=_blank rel=noopener>gwern from Reddit</a> pointed me to the paper <a href=http://ronan.collobert.com/pub/matos/2009_curriculum_icml.pdf target=_blank rel=noopener>Curriculum Learning</a> by Bengio et al., which discusses this idea).</p><figure><a href=rls-flashcard.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/rls-flashcard_hu3a07a6eda7f0dd6656303f37c93114ae_407004_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/rls-flashcard_hu3a07a6eda7f0dd6656303f37c93114ae_407004_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/rls-flashcard_hu3a07a6eda7f0dd6656303f37c93114ae_407004_720x0_resize_box_3.png 720w,
diff --git a/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/index.html b/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/index.html
index 94ba050e2..89fba2bf1 100644
--- a/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/index.html
+++ b/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Why you should stop worrying about deep learning and deepen your understanding of causality instead | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="analytics,causal inference,data science,deep learning,insights,machine learning,predictive modelling"><meta name=description content="Causality is often overlooked but is of much higher relevance to most data scientists than deep learning."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Why you should stop worrying about deep learning and deepen your understanding of causality instead"><meta property="og:description" content="Causality is often overlooked but is of much higher relevance to most data scientists than deep learning."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/"><meta property="og:image" content="https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/correlation-xkcd.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2016-02-14T11:04:11+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/correlation-xkcd.png"><meta name=twitter:title content="Why you should stop worrying about deep learning and deepen your understanding of causality instead"><meta name=twitter:description content="Causality is often overlooked but is of much higher relevance to most data scientists than deep learning."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Why you should stop worrying about deep learning and deepen your understanding of causality instead","item":"https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Why you should stop worrying about deep learning and deepen your understanding of causality instead","name":"Why you should stop worrying about deep learning and deepen your understanding of causality instead","description":"Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.","keywords":["analytics","causal inference","data science","deep learning","insights","machine learning","predictive modelling"],"articleBody":"Everywhere you go these days, you hear about deep learning’s impressive advancements. New deep learning libraries, tools, and products get announced on a regular basis, making the average data scientist feel like they’re missing out if they don’t hop on the deep learning bandwagon. However, as Kamil Bartocha put it in his post The Inconvenient Truth About Data Science, 95% of tasks do not require deep learning. This is obviously a made up number, but it’s probably an accurate representation of the everyday reality of many data scientists. This post discusses an often-overlooked area of study that is of much higher relevance to most data scientists than deep learning: causality.\nCausality is everywhere An understanding of cause and effect is something that is not unique to humans. For example, the many videos of cats knocking things off tables appear to exemplify experimentation by animals. If you are not familiar with such videos, it can easily be fixed. The thing to notice is that cats appear genuinely curious about what happens when they push an object. And they tend to repeat the experiment to verify that if you push something off, it falls to the ground.\nHumans rely on much more complex causal analysis than that done by cats – an understanding of the long-term effects of one’s actions is crucial to survival. Science, as defined by Wikipedia, is a systematic enterprise that creates, builds and organizes knowledge in the form of testable explanations and predictions about the universe. Causal analysis is key to producing explanations and predictions that are valid and sound, which is why understanding causality is so important to data scientists, traditional scientists, and all humans.\nWhat is causality? It is surprisingly hard to define causality. Just like cats, we all have an intuitive sense of what causality is, but things get complicated on deeper inspection. For example, few people would disagree with the statement that smoking causes cancer. But does it cause cancer immediately? Would smoking a few cigarettes today and never again cause cancer? Do all smokers develop cancer eventually? What about light smokers who live in areas with heavy air pollution?\nSamantha Kleinberg summarises it very well in her book, Why: A Guide to Finding and Using Causes:\nWhile most definitions of causality are based on Hume’s work, none of the ones we can come up with cover all possible cases and each one has counterexamples another does not. For instance, a medication may lead to side effects in only a small fraction of users (so we can’t assume that a cause will always produce an effect), and seat belts normally prevent death but can cause it in some car accidents (so we need to allow for factors that can have mixed producer/preventer roles depending on context).\nThe question often boils down to whether we should see causes as a fundamental building block or force of the world (that can’t be further reduced to any other laws), or if this structure is something we impose. As with nearly every facet of causality, there is disagreement on this point (and even disagreement about whether particular theories are compatible with this notion, which is called causal realism). Some have felt that causes are so hard to find as for the search to be hopeless and, further, that once we have some physical laws, those are more useful than causes anyway. That is, “causes” may be a mere shorthand for things like triggers, pushes, repels, prevents, and so on, rather than a fundamental notion.\nIt is somewhat surprising, given how central the idea of causality is to our daily lives, but there is simply no unified philosophical theory of what causes are, and no single foolproof computational method for finding them with absolute certainty. What makes this even more challenging is that, depending on one’s definition of causality, different factors may be identified as causes in the same situation, and it may not be clear what the ground truth is.\nWhy study causality now? While it’s hard to conclusively prove, it seems to me like interest in formal causal analysis has increased in recent years. My hypothesis is that it’s just a natural progression along the levels of data’s hierarchy of needs. At the start of the big data boom, people were mostly concerned with storing and processing large amounts of data (e.g., using Hadoop, Elasticsearch, or your favourite NoSQL database). Just having your data flowing through pipelines is nice, but not very useful, so the focus switched to reporting and visualisation to extract insights about what happened (commonly known as business intelligence). While having a good picture of what happened is great, it isn’t enough – you can make better decisions if you can predict what’s going to happen, so the focus switched again to predictive analytics. Those who are familiar with predictive analytics know that models often end up relying on correlations between the features and the predicted labels. Using such models without considering the meaning of the variables can lead us to erroneous conclusions, and potentially harmful interventions. For example, based on the following graph we may make a recommendation that the US government decrease its spending on science to reduce the number of suicides by hanging.\nSource: Spurious Correlations by Tyler Vigen Causal analysis aims to identify factors that are independent of spurious correlations, allowing stakeholders to make well-informed decisions. It is all about getting to the top of the DIKW (data-information-knowledge-wisdom) pyramid by understanding why things happen and what we can do to change the world. However, finding true causes can be very hard, especially in cases where you can’t perform experiments. Judea Pearl explains it well:\nWe know, from first principles, that any causal conclusion drawn from observational studies must rest on untested causal assumptions. Cartwright (1989) named this principle ‘no causes in, no causes out,’ which follows formally from the theory of equivalent models (Verma and Pearl, 1990); for any model yielding a conclusion C, one can construct a statistically equivalent model that refutes C and fits the data equally well.\nWhat this means in practice is that you can’t, for example, conclusively prove that smoking causes cancer without making some reasonable assumptions about the mechanisms at play. For ethical reasons, we can’t perform a randomly controlled trial where a test group is forced to smoke for years while a control group is forced not to smoke. Therefore, our conclusions about the causal link between smoking and cancer are drawn from observational studies and an understanding of the mechanisms by which various cancers develop (e.g., the effect of cigarette smoke on individual cells can be studied without forcing people to smoke). Cancer Tobacco companies have exploited this fact for years, making the claim that the probability of both cancer and smoking is raised by some mysterious genetic factors. Fossil fuel and food companies use similar arguments to sell their products and block attempts to regulate their industries (as discussed in previous posts on the hardest parts of data science and nutritionism). Fighting against such arguments is an uphill battle, as it is easy to sow doubt with a few simplistic catchphrases, while proving and communicating causality to laypeople is much harder (or impossible when it comes to deeply-held irrational beliefs).\nMy causality journey is just beginning My interest in formal causal analysis was seeded a couple of years ago, with a reading group that was dedicated to Judea Pearl’s work. We didn’t get very far, as I was a bit disappointed with what causal calculus can and cannot do. This may have been because I didn’t come in with the right expectations – I expected a black box that automatically finds causes. Recently reading Samantha Kleinberg’s excellent book Why: A Guide to Finding and Using Causes has made my expectations somewhat more realistic:\nThousands of years after Aristotle’s seminal work on causality, hundreds of years after Hume gave us two definitions of it, and decades after automated inference became a possibility through powerful new computers, causality is still an unsolved problem. Humans are prone to seeing causality where it does not exist and our algorithms aren’t foolproof. Even worse, once we find a cause it’s still hard to use this information to prevent or produce an outcome because of limits on what information we can collect and how we can understand it. After looking at all the cases where methods haven’t worked and researchers and policy makers have gotten causality really wrong, you might wonder why you should bother.\n[…]\nRather than giving up on causality, what we need to give up on is the idea of having a black box that takes some data straight from its source and emits a stream of causes with no need for interpretation or human intervention. Causal inference is necessary and possible, but it is not perfect and, most importantly, it requires domain knowledge.\nKleinberg’s book is a great general intro to causality, but it intentionally omits the mathematical details behind the various methods. I am now ready to once again go deeper into causality, perhaps starting with Kleinberg’s more technical book, Causality, Probability, and Time. Other recommendations are very welcome!\nCover image source: xkcd: Correlation ","wordCount":"1532","inLanguage":"en","image":"https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/correlation-xkcd.png","datePublished":"2016-02-14T11:04:11Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Why you should stop worrying about deep learning and deepen your understanding of causality instead</h1><div class=post-meta><span title='2016-02-14 11:04:11 +0000 UTC'>February 14, 2016</span></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/correlation-xkcd.png alt></figure><div class=post-content><p>Everywhere you go these days, you hear about deep learning&rsquo;s impressive advancements. New deep learning libraries, tools, and products get announced on a regular basis, making the average data scientist feel like they&rsquo;re missing out if they don&rsquo;t <a href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/ target=_blank rel=noopener>hop on the deep learning bandwagon</a>. However, as Kamil Bartocha put it in his post <a href=https://www.linkedin.com/pulse/inconvenient-truth-data-science-kamil-bartocha target=_blank rel=noopener>The Inconvenient Truth About Data Science</a>, <em>95% of tasks do not require deep learning</em>. This is obviously <a href=http://dilbert.com/strip/2008-05-08 target=_blank rel=noopener>a made up number</a>, but it&rsquo;s probably an accurate representation of the everyday reality of many data scientists. This post discusses an often-overlooked area of study that is of much higher relevance to most data scientists than deep learning: <strong>causality</strong>.</p><h2 id=causality-is-everywhere>Causality is everywhere<a hidden class=anchor aria-hidden=true href=#causality-is-everywhere>#</a></h2><p>An understanding of cause and effect is something that is not unique to humans. For example, the many videos of cats knocking things off tables appear to exemplify experimentation by animals. If you are not familiar with such videos, <a href="https://www.youtube.com/results?search_query=cat+knocking+stuff+off" target=_blank rel=noopener>it can easily be fixed</a>. The thing to notice is that cats appear genuinely curious about what happens when they push an object. And they tend to repeat the experiment to verify that if you push something off, it falls to the ground.</p><p><div style=position:relative;padding-bottom:56.25%;height:0;overflow:hidden><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen loading=eager referrerpolicy=strict-origin-when-cross-origin src="https://www.youtube.com/embed/UoUEQYjYgf4?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style=position:absolute;top:0;left:0;width:100%;height:100%;border:0 title="YouTube video"></iframe></div></p><p>Humans rely on much more complex causal analysis than that done by cats – an understanding of the long-term effects of one&rsquo;s actions is crucial to survival. <a href=https://en.wikipedia.org/wiki/Science target=_blank rel=noopener>Science, as defined by Wikipedia</a>, <em>is a systematic enterprise that creates, builds and organizes knowledge in the form of testable explanations and predictions about the universe</em>. Causal analysis is key to producing explanations and predictions that are valid and sound, which is why understanding causality is so important to data scientists, traditional scientists, and all humans.</p><h2 id=what-is-causality>What is causality?<a hidden class=anchor aria-hidden=true href=#what-is-causality>#</a></h2><p>It is surprisingly hard to define causality. Just like cats, we all have an intuitive sense of what causality is, but things get complicated on deeper inspection. For example, few people would disagree with the statement that <em>smoking causes cancer</em>. But does it cause cancer immediately? Would smoking a few cigarettes today and never again cause cancer? Do all smokers develop cancer eventually? What about light smokers who live in areas with heavy air pollution?</p><p>Samantha Kleinberg summarises it very well in her book, <a href=http://www.skleinberg.org/why/ target=_blank rel=noopener>Why: A Guide to Finding and Using Causes</a>:</p><blockquote><p>While most definitions of causality are based on <a href=https://en.wikipedia.org/wiki/David_Hume target=_blank rel=noopener>Hume&rsquo;s work</a>, none of the ones we can come up with cover all possible cases and each one has counterexamples another does not. For instance, a medication may lead to side effects in only a small fraction of users (so we can&rsquo;t assume that a cause will always produce an effect), and seat belts normally prevent death but can cause it in some car accidents (so we need to allow for factors that can have mixed producer/preventer roles depending on context).</p><p>The question often boils down to whether we should see causes as a fundamental building block or force of the world (that can&rsquo;t be further reduced to any other laws), or if this structure is something we impose. As with nearly every facet of causality, there is disagreement on this point (and even disagreement about whether particular theories are compatible with this notion, which is called causal realism). Some have felt that causes are so hard to find as for the search to be hopeless and, further, that once we have some physical laws, those are more useful than causes anyway. That is, &ldquo;causes&rdquo; may be a mere shorthand for things like triggers, pushes, repels, prevents, and so on, rather than a fundamental notion.</p><p>It is somewhat surprising, given how central the idea of causality is to our daily lives, but there is simply no unified philosophical theory of what causes are, and no single foolproof computational method for finding them with absolute certainty. What makes this even more challenging is that, depending on one’s definition of causality, different factors may be identified as causes in the same situation, and it may not be clear what the ground truth is.</p></blockquote><h2 id=why-study-causality-now>Why study causality now?<a hidden class=anchor aria-hidden=true href=#why-study-causality-now>#</a></h2><p>While it&rsquo;s hard to conclusively prove, it seems to me like interest in formal causal analysis has increased in recent years. My hypothesis is that it&rsquo;s just a natural progression along the levels of <a href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/>data&rsquo;s hierarchy of needs</a>. At the start of the big data boom, people were mostly concerned with storing and processing large amounts of data (e.g., using Hadoop, Elasticsearch, or your favourite NoSQL database). Just having your data flowing through pipelines is nice, but not very useful, so the focus switched to reporting and visualisation to extract insights about what happened (commonly known as business intelligence). While having a good picture of what happened is great, it isn&rsquo;t enough – you can make better decisions if you can predict what&rsquo;s going to happen, so the focus switched again to predictive analytics. Those who are familiar with predictive analytics know that models often end up relying on correlations between the features and the predicted labels. Using such models without considering the meaning of the variables can lead us to erroneous conclusions, and potentially harmful interventions. For example, based on the following graph we may make a recommendation that the US government decrease its spending on science to reduce the number of suicides by hanging.</p><figure><a href=us-science-spending-versus-suicides.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="analytics,causal inference,data science,deep learning,insights,machine learning,predictive modelling"><meta name=description content="Causality is often overlooked but is of much higher relevance to most data scientists than deep learning."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Why you should stop worrying about deep learning and deepen your understanding of causality instead"><meta property="og:description" content="Causality is often overlooked but is of much higher relevance to most data scientists than deep learning."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/"><meta property="og:image" content="https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/correlation-xkcd.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2016-02-14T11:04:11+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/correlation-xkcd.png"><meta name=twitter:title content="Why you should stop worrying about deep learning and deepen your understanding of causality instead"><meta name=twitter:description content="Causality is often overlooked but is of much higher relevance to most data scientists than deep learning."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Why you should stop worrying about deep learning and deepen your understanding of causality instead","item":"https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Why you should stop worrying about deep learning and deepen your understanding of causality instead","name":"Why you should stop worrying about deep learning and deepen your understanding of causality instead","description":"Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.","keywords":["analytics","causal inference","data science","deep learning","insights","machine learning","predictive modelling"],"articleBody":"Everywhere you go these days, you hear about deep learning’s impressive advancements. New deep learning libraries, tools, and products get announced on a regular basis, making the average data scientist feel like they’re missing out if they don’t hop on the deep learning bandwagon. However, as Kamil Bartocha put it in his post The Inconvenient Truth About Data Science, 95% of tasks do not require deep learning. This is obviously a made up number, but it’s probably an accurate representation of the everyday reality of many data scientists. This post discusses an often-overlooked area of study that is of much higher relevance to most data scientists than deep learning: causality.\nCausality is everywhere An understanding of cause and effect is something that is not unique to humans. For example, the many videos of cats knocking things off tables appear to exemplify experimentation by animals. If you are not familiar with such videos, it can easily be fixed. The thing to notice is that cats appear genuinely curious about what happens when they push an object. And they tend to repeat the experiment to verify that if you push something off, it falls to the ground.\nHumans rely on much more complex causal analysis than that done by cats – an understanding of the long-term effects of one’s actions is crucial to survival. Science, as defined by Wikipedia, is a systematic enterprise that creates, builds and organizes knowledge in the form of testable explanations and predictions about the universe. Causal analysis is key to producing explanations and predictions that are valid and sound, which is why understanding causality is so important to data scientists, traditional scientists, and all humans.\nWhat is causality? It is surprisingly hard to define causality. Just like cats, we all have an intuitive sense of what causality is, but things get complicated on deeper inspection. For example, few people would disagree with the statement that smoking causes cancer. But does it cause cancer immediately? Would smoking a few cigarettes today and never again cause cancer? Do all smokers develop cancer eventually? What about light smokers who live in areas with heavy air pollution?\nSamantha Kleinberg summarises it very well in her book, Why: A Guide to Finding and Using Causes:\nWhile most definitions of causality are based on Hume’s work, none of the ones we can come up with cover all possible cases and each one has counterexamples another does not. For instance, a medication may lead to side effects in only a small fraction of users (so we can’t assume that a cause will always produce an effect), and seat belts normally prevent death but can cause it in some car accidents (so we need to allow for factors that can have mixed producer/preventer roles depending on context).\nThe question often boils down to whether we should see causes as a fundamental building block or force of the world (that can’t be further reduced to any other laws), or if this structure is something we impose. As with nearly every facet of causality, there is disagreement on this point (and even disagreement about whether particular theories are compatible with this notion, which is called causal realism). Some have felt that causes are so hard to find as for the search to be hopeless and, further, that once we have some physical laws, those are more useful than causes anyway. That is, “causes” may be a mere shorthand for things like triggers, pushes, repels, prevents, and so on, rather than a fundamental notion.\nIt is somewhat surprising, given how central the idea of causality is to our daily lives, but there is simply no unified philosophical theory of what causes are, and no single foolproof computational method for finding them with absolute certainty. What makes this even more challenging is that, depending on one’s definition of causality, different factors may be identified as causes in the same situation, and it may not be clear what the ground truth is.\nWhy study causality now? While it’s hard to conclusively prove, it seems to me like interest in formal causal analysis has increased in recent years. My hypothesis is that it’s just a natural progression along the levels of data’s hierarchy of needs. At the start of the big data boom, people were mostly concerned with storing and processing large amounts of data (e.g., using Hadoop, Elasticsearch, or your favourite NoSQL database). Just having your data flowing through pipelines is nice, but not very useful, so the focus switched to reporting and visualisation to extract insights about what happened (commonly known as business intelligence). While having a good picture of what happened is great, it isn’t enough – you can make better decisions if you can predict what’s going to happen, so the focus switched again to predictive analytics. Those who are familiar with predictive analytics know that models often end up relying on correlations between the features and the predicted labels. Using such models without considering the meaning of the variables can lead us to erroneous conclusions, and potentially harmful interventions. For example, based on the following graph we may make a recommendation that the US government decrease its spending on science to reduce the number of suicides by hanging.\nSource: Spurious Correlations by Tyler Vigen Causal analysis aims to identify factors that are independent of spurious correlations, allowing stakeholders to make well-informed decisions. It is all about getting to the top of the DIKW (data-information-knowledge-wisdom) pyramid by understanding why things happen and what we can do to change the world. However, finding true causes can be very hard, especially in cases where you can’t perform experiments. Judea Pearl explains it well:\nWe know, from first principles, that any causal conclusion drawn from observational studies must rest on untested causal assumptions. Cartwright (1989) named this principle ‘no causes in, no causes out,’ which follows formally from the theory of equivalent models (Verma and Pearl, 1990); for any model yielding a conclusion C, one can construct a statistically equivalent model that refutes C and fits the data equally well.\nWhat this means in practice is that you can’t, for example, conclusively prove that smoking causes cancer without making some reasonable assumptions about the mechanisms at play. For ethical reasons, we can’t perform a randomly controlled trial where a test group is forced to smoke for years while a control group is forced not to smoke. Therefore, our conclusions about the causal link between smoking and cancer are drawn from observational studies and an understanding of the mechanisms by which various cancers develop (e.g., the effect of cigarette smoke on individual cells can be studied without forcing people to smoke). Cancer Tobacco companies have exploited this fact for years, making the claim that the probability of both cancer and smoking is raised by some mysterious genetic factors. Fossil fuel and food companies use similar arguments to sell their products and block attempts to regulate their industries (as discussed in previous posts on the hardest parts of data science and nutritionism). Fighting against such arguments is an uphill battle, as it is easy to sow doubt with a few simplistic catchphrases, while proving and communicating causality to laypeople is much harder (or impossible when it comes to deeply-held irrational beliefs).\nMy causality journey is just beginning My interest in formal causal analysis was seeded a couple of years ago, with a reading group that was dedicated to Judea Pearl’s work. We didn’t get very far, as I was a bit disappointed with what causal calculus can and cannot do. This may have been because I didn’t come in with the right expectations – I expected a black box that automatically finds causes. Recently reading Samantha Kleinberg’s excellent book Why: A Guide to Finding and Using Causes has made my expectations somewhat more realistic:\nThousands of years after Aristotle’s seminal work on causality, hundreds of years after Hume gave us two definitions of it, and decades after automated inference became a possibility through powerful new computers, causality is still an unsolved problem. Humans are prone to seeing causality where it does not exist and our algorithms aren’t foolproof. Even worse, once we find a cause it’s still hard to use this information to prevent or produce an outcome because of limits on what information we can collect and how we can understand it. After looking at all the cases where methods haven’t worked and researchers and policy makers have gotten causality really wrong, you might wonder why you should bother.\n[…]\nRather than giving up on causality, what we need to give up on is the idea of having a black box that takes some data straight from its source and emits a stream of causes with no need for interpretation or human intervention. Causal inference is necessary and possible, but it is not perfect and, most importantly, it requires domain knowledge.\nKleinberg’s book is a great general intro to causality, but it intentionally omits the mathematical details behind the various methods. I am now ready to once again go deeper into causality, perhaps starting with Kleinberg’s more technical book, Causality, Probability, and Time. Other recommendations are very welcome!\nCover image source: xkcd: Correlation ","wordCount":"1532","inLanguage":"en","image":"https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/correlation-xkcd.png","datePublished":"2016-02-14T11:04:11Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Why you should stop worrying about deep learning and deepen your understanding of causality instead</h1><div class=post-meta><span title='2016-02-14 11:04:11 +0000 UTC'>February 14, 2016</span></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/correlation-xkcd.png alt></figure><div class=post-content><p>Everywhere you go these days, you hear about deep learning&rsquo;s impressive advancements. New deep learning libraries, tools, and products get announced on a regular basis, making the average data scientist feel like they&rsquo;re missing out if they don&rsquo;t <a href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/ target=_blank rel=noopener>hop on the deep learning bandwagon</a>. However, as Kamil Bartocha put it in his post <a href=https://www.linkedin.com/pulse/inconvenient-truth-data-science-kamil-bartocha target=_blank rel=noopener>The Inconvenient Truth About Data Science</a>, <em>95% of tasks do not require deep learning</em>. This is obviously <a href=http://dilbert.com/strip/2008-05-08 target=_blank rel=noopener>a made up number</a>, but it&rsquo;s probably an accurate representation of the everyday reality of many data scientists. This post discusses an often-overlooked area of study that is of much higher relevance to most data scientists than deep learning: <strong>causality</strong>.</p><h2 id=causality-is-everywhere>Causality is everywhere<a hidden class=anchor aria-hidden=true href=#causality-is-everywhere>#</a></h2><p>An understanding of cause and effect is something that is not unique to humans. For example, the many videos of cats knocking things off tables appear to exemplify experimentation by animals. If you are not familiar with such videos, <a href="https://www.youtube.com/results?search_query=cat+knocking+stuff+off" target=_blank rel=noopener>it can easily be fixed</a>. The thing to notice is that cats appear genuinely curious about what happens when they push an object. And they tend to repeat the experiment to verify that if you push something off, it falls to the ground.</p><p><div style=position:relative;padding-bottom:56.25%;height:0;overflow:hidden><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen loading=eager referrerpolicy=strict-origin-when-cross-origin src="https://www.youtube.com/embed/UoUEQYjYgf4?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style=position:absolute;top:0;left:0;width:100%;height:100%;border:0 title="YouTube video"></iframe></div></p><p>Humans rely on much more complex causal analysis than that done by cats – an understanding of the long-term effects of one&rsquo;s actions is crucial to survival. <a href=https://en.wikipedia.org/wiki/Science target=_blank rel=noopener>Science, as defined by Wikipedia</a>, <em>is a systematic enterprise that creates, builds and organizes knowledge in the form of testable explanations and predictions about the universe</em>. Causal analysis is key to producing explanations and predictions that are valid and sound, which is why understanding causality is so important to data scientists, traditional scientists, and all humans.</p><h2 id=what-is-causality>What is causality?<a hidden class=anchor aria-hidden=true href=#what-is-causality>#</a></h2><p>It is surprisingly hard to define causality. Just like cats, we all have an intuitive sense of what causality is, but things get complicated on deeper inspection. For example, few people would disagree with the statement that <em>smoking causes cancer</em>. But does it cause cancer immediately? Would smoking a few cigarettes today and never again cause cancer? Do all smokers develop cancer eventually? What about light smokers who live in areas with heavy air pollution?</p><p>Samantha Kleinberg summarises it very well in her book, <a href=http://www.skleinberg.org/why/ target=_blank rel=noopener>Why: A Guide to Finding and Using Causes</a>:</p><blockquote><p>While most definitions of causality are based on <a href=https://en.wikipedia.org/wiki/David_Hume target=_blank rel=noopener>Hume&rsquo;s work</a>, none of the ones we can come up with cover all possible cases and each one has counterexamples another does not. For instance, a medication may lead to side effects in only a small fraction of users (so we can&rsquo;t assume that a cause will always produce an effect), and seat belts normally prevent death but can cause it in some car accidents (so we need to allow for factors that can have mixed producer/preventer roles depending on context).</p><p>The question often boils down to whether we should see causes as a fundamental building block or force of the world (that can&rsquo;t be further reduced to any other laws), or if this structure is something we impose. As with nearly every facet of causality, there is disagreement on this point (and even disagreement about whether particular theories are compatible with this notion, which is called causal realism). Some have felt that causes are so hard to find as for the search to be hopeless and, further, that once we have some physical laws, those are more useful than causes anyway. That is, &ldquo;causes&rdquo; may be a mere shorthand for things like triggers, pushes, repels, prevents, and so on, rather than a fundamental notion.</p><p>It is somewhat surprising, given how central the idea of causality is to our daily lives, but there is simply no unified philosophical theory of what causes are, and no single foolproof computational method for finding them with absolute certainty. What makes this even more challenging is that, depending on one’s definition of causality, different factors may be identified as causes in the same situation, and it may not be clear what the ground truth is.</p></blockquote><h2 id=why-study-causality-now>Why study causality now?<a hidden class=anchor aria-hidden=true href=#why-study-causality-now>#</a></h2><p>While it&rsquo;s hard to conclusively prove, it seems to me like interest in formal causal analysis has increased in recent years. My hypothesis is that it&rsquo;s just a natural progression along the levels of <a href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/>data&rsquo;s hierarchy of needs</a>. At the start of the big data boom, people were mostly concerned with storing and processing large amounts of data (e.g., using Hadoop, Elasticsearch, or your favourite NoSQL database). Just having your data flowing through pipelines is nice, but not very useful, so the focus switched to reporting and visualisation to extract insights about what happened (commonly known as business intelligence). While having a good picture of what happened is great, it isn&rsquo;t enough – you can make better decisions if you can predict what&rsquo;s going to happen, so the focus switched again to predictive analytics. Those who are familiar with predictive analytics know that models often end up relying on correlations between the features and the predicted labels. Using such models without considering the meaning of the variables can lead us to erroneous conclusions, and potentially harmful interventions. For example, based on the following graph we may make a recommendation that the US government decrease its spending on science to reduce the number of suicides by hanging.</p><figure><a href=us-science-spending-versus-suicides.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/us-science-spending-versus-suicides_hucb19a666efd495d868358d2e56c5c43f_82139_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/us-science-spending-versus-suicides_hucb19a666efd495d868358d2e56c5c43f_82139_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/us-science-spending-versus-suicides_hucb19a666efd495d868358d2e56c5c43f_82139_720x0_resize_box_3.png 720w,
diff --git a/2016/03/20/the-rise-of-greedy-robots/index.html b/2016/03/20/the-rise-of-greedy-robots/index.html
index ab61bdd8e..70e20b630 100644
--- a/2016/03/20/the-rise-of-greedy-robots/index.html
+++ b/2016/03/20/the-rise-of-greedy-robots/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>The rise of greedy robots | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data science,deep learning,economics,futurism,machine intelligence"><meta name=description content="Is artificial/machine intelligence a future threat? I argue that it&rsquo;s already here, with greedy robots already dominating our lives."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The rise of greedy robots"><meta property="og:description" content="Is artificial/machine intelligence a future threat? I argue that it&rsquo;s already here, with greedy robots already dominating our lives."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/"><meta property="og:image" content="https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2016-03-20T20:33:43+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot.jpg"><meta name=twitter:title content="The rise of greedy robots"><meta name=twitter:description content="Is artificial/machine intelligence a future threat? I argue that it&rsquo;s already here, with greedy robots already dominating our lives."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"The rise of greedy robots","item":"https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The rise of greedy robots","name":"The rise of greedy robots","description":"Is artificial/machine intelligence a future threat? I argue that it\u0026rsquo;s already here, with greedy robots already dominating our lives.","keywords":["data science","deep learning","economics","futurism","machine intelligence"],"articleBody":"Given the impressive advancement of machine intelligence in recent years, many people have been speculating on what the future holds when it comes to the power and roles of robots in our society. Some have even called for regulation of machine intelligence before it’s too late. My take on this issue is that there is no need to speculate – machine intelligence is already here, with greedy robots already dominating our lives.\nMachine intelligence or artificial intelligence? The problem with talking about artificial intelligence is that it creates an inflated expectation of machines that would be completely human-like – we won’t have true artificial intelligence until we can create machines that are indistinguishable from humans. While the goal of mimicking human intelligence is certainly interesting, it is clear that we are very far from achieving it. We currently can’t even fully simulate C. elegans, a 1mm worm with 302 neurons. However, we do have machines that can perform tasks that require intelligence, where intelligence is defined as the ability to learn or understand things or to deal with new or difficult situations. Unlike artificial intelligence, there is no doubt that machine intelligence already exists.\nAirplanes provide a famous example: we don’t commonly think of them as performing artificial flight – they are machines that fly faster than any bird. Likewise, computers are super-intelligent machines. They can perform calculations that humans can’t, store and recall enormous amounts of information, translate text, play Go, drive cars, and much more – all without requiring rest or food. The robots are here, and they are becoming increasingly useful and powerful.\nWho are those greedy robots? Greed is defined as a selfish desire to have more of something (especially money). It is generally seen as a negative trait in humans. However, we have been cultivating an environment where greedy entities – for-profit organisations – thrive. The primary goal of for-profit organisations is to generate profit for their shareholders. If these organisations were human, they would be seen as the embodiment of greed, as they are focused on making money and little else. Greedy organisations “live” among us and have been enjoying a plethora of legal rights and protections for hundreds of years. These entities, which were formed and shaped by humans, now form and shape human lives.\nHumans running for-profit organisations have little choice but to play by their rules. For example, many people acknowledge that corporate tax avoidance is morally wrong, as revenue from taxes supports the infrastructure and society that enable corporate profits. However, any executive of a public company who refuses to do everything they legally can to minimise their tax bill is likely to lose their job. Despite being separate from the greedy organisations we run, humans have to act greedily to effectively serve their employers.\nThe relationship between greedy organisations and greedy robots is clear. Much of the funding that goes into machine intelligence research comes from for-profit organisations, with the end goal of producing profit for these entities. In the words of Jeffrey Hammerbacher: The best minds of my generation are thinking about how to make people click ads. Hammerbacher, an early Facebook employee, was referring to Facebook’s business model, where considerable resources are dedicated to getting people to engage with advertising – the main driver of Facebook’s revenue. Indeed, Facebook has hired Yann LeCun (a prominent machine intelligence researcher) to head its artificial intelligence research efforts. While LeCun’s appointment will undoubtedly result in general research advancements, Facebook’s motivation is clear – they see machine intelligence as a key driver of future profits. They, and other companies, use machine intelligence to build greedy robots, whose sole goal is to increase profits.\nGreedy robots are all around us. Advertising-driven companies like Facebook and Google use sophisticated algorithms to get people to click on ads. Retail companies like Amazon use machine intelligence to mine through people’s shopping history and generate product recommendations. Banks and mutual funds utilise algorithmic trading to drive their investments. None of this is science fiction, and it doesn’t take much of a leap to imagine a world where greedy robots are even more dominant. Just like we have allowed greedy legal entities to dominate our world and shape our lives, we are allowing greedy robots to do the same, just more efficiently and pervasively.\nWill robots take your job? The growing range of machine intelligence capabilities gives rise to the question of whether robots are going to take over human jobs. One salient example is that of self-driving cars, that are projected to render millions of professional drivers obsolete in the next few decades. The potential impact of machine intelligence on jobs was summarised very well by CGP Grey in his video Humans Need Not Apply. The main message of the video is that machines will soon be able to perform any job better or more cost-effectively than any human, thereby making humans unemployable for economic reasons. The video ends with a call to society to consider how to deal with a future where there are simply no jobs for a large part of the population.\nDespite all the technological advancements since the start of the industrial revolution, the prevailing mode of wealth distribution remains paid labour, i.e., jobs. The implication of this is that much of the work we do is unnecessary or harmful – people work because they have no other option, but their work doesn’t necessarily benefit society. This isn’t a new insight, as the following quotes demonstrate:\n“Most men appear never to have considered what a house is, and are actually though needlessly poor all their lives because they think that they must have such a one as their neighbors have. […] For more than five years I maintained myself thus solely by the labor of my hands, and I found that, by working about six weeks in a year, I could meet all the expenses of living.” – Henry David Thoreau, Walden (1854) “I think that there is far too much work done in the world, that immense harm is caused by the belief that work is virtuous, and that what needs to be preached in modern industrial countries is quite different from what always has been preached. […] Modern technique has made it possible to diminish enormously the amount of labor required to secure the necessaries of life for everyone. […] If, at the end of the war, the scientific organization, which had been created in order to liberate men for fighting and munition work, had been preserved, and the hours of the week had been cut down to four, all would have been well. Instead of that the old chaos was restored, those whose work was demanded were made to work long hours, and the rest were left to starve as unemployed.” – Bertrand Russell, In Praise of Idleness (1932) “In the year 1930, John Maynard Keynes predicted that technology would have advanced sufficiently by century’s end that countries like Great Britain or the United States would achieve a 15-hour work week. There’s every reason to believe he was right. In technological terms, we are quite capable of this. And yet it didn’t happen. Instead, technology has been marshaled, if anything, to figure out ways to make us all work more. In order to achieve this, jobs have had to be created that are, effectively, pointless. Huge swathes of people, in Europe and North America in particular, spend their entire working lives performing tasks they secretly believe do not really need to be performed. The moral and spiritual damage that comes from this situation is profound. It is a scar across our collective soul. Yet virtually no one talks about it.” – David Graeber, On the Phenomenon of Bullshit Jobs (2013) This leads to the conclusion that we are unlikely to experience the utopian future in which intelligent machines do all our work, leaving us ample time for leisure. Yes, people will lose their jobs. But it is not unlikely that new unnecessary jobs will be invented to keep people busy, or worse, many people will simply be unemployed and will not get to enjoy the wealth provided by technology. Stephen Hawking summarised it well recently:\nIf machines produce everything we need, the outcome will depend on how things are distributed. Everyone can enjoy a life of luxurious leisure if the machine-produced wealth is shared, or most people can end up miserably poor if the machine-owners successfully lobby against wealth redistribution. So far, the trend seems to be toward the second option, with technology driving ever-increasing inequality.\nWhere to from here? Many people believe that the existence of powerful greedy entities is good for society. Indeed, there is no doubt that we owe many beneficial technological breakthroughs to competition between for-profit companies. However, a single-minded focus on profit means that in many cases companies do what they can to reduce their responsibility for harmful side-effects of their activities. Examples include environmental pollution, multinational tax evasion, and health effects of products like tobacco and junk food. As history shows us, in truly unregulated markets, companies would happily utilise slavery and child labour to reduce their costs. Clearly, some regulation of greedy entities is required to obtain the best results for society.\nWith machine intelligence becoming increasingly powerful every day, some people think that to produce the best outcomes, we just need to wait for robots to be intelligent enough to completely run our lives. However, as anyone who has actually built intelligent systems knows, the outputs of such systems are strongly dependent on the inputs and goals set by system designers. Machine intelligence is just a tool – a very powerful tool. Like nuclear energy, we can use it to improve our lives, or we can use it to obliterate everything around us. The collective choice is ours to make, but is far from simple.\n","wordCount":"1644","inLanguage":"en","image":"https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot.jpg","datePublished":"2016-03-20T20:33:43Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">The rise of greedy robots</h1><div class=post-meta><span title='2016-03-20 20:33:43 +0000 UTC'>March 20, 2016</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot_hu400343414979e1c2dc8bafadfe0b6d4d_563587_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot_hu400343414979e1c2dc8bafadfe0b6d4d_563587_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot_hu400343414979e1c2dc8bafadfe0b6d4d_563587_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot_hu400343414979e1c2dc8bafadfe0b6d4d_563587_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot_hu400343414979e1c2dc8bafadfe0b6d4d_563587_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot.jpg 1920w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot.jpg alt width=1920 height=1064></figure><div class=post-content><p>Given the impressive advancement of machine intelligence in recent years, many people have been speculating on what the future holds when it comes to the power and roles of robots in our society. Some have even <a href=http://www.theguardian.com/technology/2014/oct/27/elon-musk-artificial-intelligence-ai-biggest-existential-threat target=_blank rel=noopener>called for regulation of machine intelligence before it&rsquo;s too late</a>. My take on this issue is that there is no need to speculate – machine intelligence is already here, with greedy robots already dominating our lives.</p><h2 id=machine-intelligence-or-artificial-intelligence>Machine intelligence or artificial intelligence?<a hidden class=anchor aria-hidden=true href=#machine-intelligence-or-artificial-intelligence>#</a></h2><p>The problem with talking about <em>artificial</em> intelligence is that it creates an inflated expectation of machines that would be completely human-like – we won&rsquo;t have true artificial intelligence until we can create machines that are indistinguishable from humans. While the goal of mimicking human intelligence is certainly interesting, it is clear that we are very far from achieving it. We currently <a href=http://www.openworm.org/ target=_blank rel=noopener>can&rsquo;t even fully simulate C. elegans, a 1mm worm with 302 neurons</a>. However, we do have machines that can perform tasks that require intelligence, where intelligence is defined as <a href=http://www.merriam-webster.com/dictionary/intelligence target=_blank rel=noopener>the ability to learn or understand things or to deal with new or difficult situations</a>. Unlike artificial intelligence, there is no doubt that <em>machine</em> intelligence already exists.</p><p>Airplanes provide a famous example: we don&rsquo;t commonly think of them as performing artificial flight – they are machines that fly faster than any bird. Likewise, computers are super-intelligent machines. They can perform calculations that humans can&rsquo;t, store and recall enormous amounts of information, translate text, play Go, drive cars, and much more – all without requiring rest or food. The robots are here, and they are becoming increasingly useful and powerful.</p><h2 id=who-are-those-greedy-robots>Who are those greedy robots?<a hidden class=anchor aria-hidden=true href=#who-are-those-greedy-robots>#</a></h2><p>Greed is defined as <a href=http://www.merriam-webster.com/dictionary/greed target=_blank rel=noopener>a selfish desire to have more of something (especially money)</a>. It is generally seen as a negative trait in humans. However, we have been cultivating an environment where greedy entities – for-profit organisations – thrive. The primary goal of for-profit organisations is to generate profit for their shareholders. If these organisations were human, they would be seen as the embodiment of greed, as they are focused on making money and little else. Greedy organisations &ldquo;live&rdquo; among us and have been enjoying a plethora of legal rights and protections for hundreds of years. These entities, which were formed and shaped by humans, now form and shape human lives.</p><p>Humans running for-profit organisations have little choice but to play by their rules. For example, many people acknowledge that corporate tax avoidance is morally wrong, as revenue from taxes supports the infrastructure and society that enable corporate profits. However, any executive of a public company who refuses to do everything they legally can to minimise their tax bill is likely to lose their job. Despite being separate from the greedy organisations we run, humans have to act greedily to effectively serve their employers.</p><p>The relationship between greedy organisations and greedy robots is clear. Much of the funding that goes into machine intelligence research comes from for-profit organisations, with the end goal of producing profit for these entities. In the <a href=http://www.fastcompany.com/3008436/takeaway/why-data-god-jeffrey-hammerbacher-left-facebook-found-cloudera target=_blank rel=noopener>words of Jeffrey Hammerbacher</a>: <em>The best minds of my generation are thinking about how to make people click ads.</em> Hammerbacher, an early Facebook employee, was referring to Facebook&rsquo;s business model, where considerable resources are dedicated to getting people to engage with advertising – the main driver of Facebook&rsquo;s revenue. Indeed, Facebook has hired <a href=https://en.wikipedia.org/wiki/Yann_LeCun target=_blank rel=noopener>Yann LeCun</a> (a prominent machine intelligence researcher) to head its artificial intelligence research efforts. While LeCun&rsquo;s appointment will undoubtedly result in general research advancements, Facebook&rsquo;s motivation is clear – they see machine intelligence as a key driver of future profits. They, and other companies, use machine intelligence to build greedy robots, whose sole goal is to increase profits.</p><p>Greedy robots are all around us. Advertising-driven companies like Facebook and Google use sophisticated algorithms to get people to click on ads. Retail companies like Amazon use machine intelligence to mine through people&rsquo;s shopping history and generate product recommendations. Banks and mutual funds utilise algorithmic trading to drive their investments. None of this is science fiction, and it doesn&rsquo;t take much of a leap to imagine a world where greedy robots are even more dominant. Just like we have allowed greedy legal entities to dominate our world and shape our lives, we are allowing greedy robots to do the same, just more efficiently and pervasively.</p><h2 id=will-robots-take-your-job>Will robots take your job?<a hidden class=anchor aria-hidden=true href=#will-robots-take-your-job>#</a></h2><p>The growing range of machine intelligence capabilities gives rise to the question of whether robots are going to take over human jobs. One salient example is that of self-driving cars, that are projected to render millions of professional drivers obsolete in the next few decades. The potential impact of machine intelligence on jobs was summarised very well by CGP Grey in his video <a href="https://www.youtube.com/watch?v=7Pq-S557XQU" target=_blank rel=noopener>Humans Need Not Apply</a>. The main message of the video is that machines will soon be able to perform any job better or more cost-effectively than any human, thereby making humans unemployable for economic reasons. The video ends with a call to society to consider how to deal with a future where there are simply no jobs for a large part of the population.</p><p>Despite all the technological advancements since the start of the industrial revolution, the prevailing mode of wealth distribution remains paid labour, i.e., jobs. The implication of this is that much of the work we do is unnecessary or harmful – people work because they have no other option, but their work doesn&rsquo;t necessarily benefit society. This isn&rsquo;t a new insight, as the following quotes demonstrate:</p><ul><li><em>&ldquo;Most men appear never to have considered what a house is, and are actually though needlessly poor all their lives because they think that they must have such a one as their neighbors have. [&mldr;] For more than five years I maintained myself thus solely by the labor of my hands, and I found that, by working about six weeks in a year, I could meet all the expenses of living.&rdquo;</em> – Henry David Thoreau, <a href=http://www.gutenberg.org/files/205/205-h/205-h.htm target=_blank rel=noopener>Walden</a> (<strong>1854</strong>)</li><li><em>&ldquo;I think that there is far too much work done in the world, that immense harm is caused by the belief that work is virtuous, and that what needs to be preached in modern industrial countries is quite different from what always has been preached. [&mldr;] Modern technique has made it possible to diminish enormously the amount of labor required to secure the necessaries of life for everyone. [&mldr;] If, at the end of the war, the scientific organization, which had been created in order to liberate men for fighting and munition work, had been preserved, and the hours of the week had been cut down to four, all would have been well. Instead of that the old chaos was restored, those whose work was demanded were made to work long hours, and the rest were left to starve as unemployed.&rdquo;</em> – Bertrand Russell, <a href=http://www.zpub.com/notes/idle.html target=_blank rel=noopener>In Praise of Idleness</a> (<strong>1932</strong>)</li><li><em>&ldquo;In the year 1930, John Maynard Keynes predicted that technology would have advanced sufficiently by century&rsquo;s end that countries like Great Britain or the United States would achieve a 15-hour work week. There&rsquo;s every reason to believe he was right. In technological terms, we are quite capable of this. And yet it didn’t happen. Instead, technology has been marshaled, if anything, to figure out ways to make us all work more. In order to achieve this, jobs have had to be created that are, effectively, pointless. Huge swathes of people, in Europe and North America in particular, spend their entire working lives performing tasks they secretly believe do not really need to be performed. The moral and spiritual damage that comes from this situation is profound. It is a scar across our collective soul. Yet virtually no one talks about it.&rdquo;</em> – David Graeber, <a href=http://strikemag.org/bullshit-jobs/ target=_blank rel=noopener>On the Phenomenon of Bullshit Jobs</a> (<strong>2013</strong>)</li></ul><p>This leads to the conclusion that we are unlikely to experience the utopian future in which intelligent machines do all our work, leaving us ample time for leisure. Yes, people will lose their jobs. But it is not unlikely that new unnecessary jobs will be invented to keep people busy, or worse, many people will simply be unemployed and will not get to enjoy the wealth provided by technology. Stephen Hawking <a href=https://www.reddit.com/r/science/comments/3nyn5i/science_ama_series_stephen_hawking_ama_answers/cvsdmkv target=_blank rel=noopener>summarised it well recently</a>:</p><blockquote><p>If machines produce everything we need, the outcome will depend on how things are distributed. Everyone can enjoy a life of luxurious leisure if the machine-produced wealth is shared, or most people can end up miserably poor if the machine-owners successfully lobby against wealth redistribution. So far, the trend seems to be toward the second option, with technology driving ever-increasing inequality.</p></blockquote><h2 id=where-to-from-here>Where to from here?<a hidden class=anchor aria-hidden=true href=#where-to-from-here>#</a></h2><p>Many people believe that the existence of powerful greedy entities is good for society. Indeed, there is no doubt that we owe many beneficial technological breakthroughs to competition between for-profit companies. However, a single-minded focus on profit means that in many cases companies do what they can to reduce their responsibility for harmful side-effects of their activities. Examples include environmental pollution, multinational tax evasion, and health effects of products like tobacco and junk food. As history shows us, in truly unregulated markets, companies would happily utilise slavery and child labour to reduce their costs. Clearly, some regulation of greedy entities is required to obtain the best results for society.</p><p>With machine intelligence becoming increasingly powerful every day, some people think that to produce the best outcomes, we just need to wait for robots to be intelligent enough to completely run our lives. However, as anyone who has actually built intelligent systems knows, the outputs of such systems are strongly dependent on the inputs and goals set by system designers. Machine intelligence is just a tool – a very powerful tool. Like nuclear energy, we can use it to improve our lives, or we can use it to obliterate everything around us. The collective choice is ours to make, but is far from simple.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/deep-learning/>Deep Learning</a></li><li><a href=https://yanirseroussi.com/tags/economics/>Economics</a></li><li><a href=https://yanirseroussi.com/tags/futurism/>Futurism</a></li><li><a href=https://yanirseroussi.com/tags/machine-intelligence/>Machine Intelligence</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The rise of greedy robots on x" href="https://x.com/intent/tweet/?text=The%20rise%20of%20greedy%20robots&amp;url=https%3a%2f%2fyanirseroussi.com%2f2016%2f03%2f20%2fthe-rise-of-greedy-robots%2f&amp;hashtags=datascience%2cdeeplearning%2ceconomics%2cfuturism%2cmachineintelligence"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rise of greedy robots on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2016%2f03%2f20%2fthe-rise-of-greedy-robots%2f&amp;title=The%20rise%20of%20greedy%20robots&amp;summary=The%20rise%20of%20greedy%20robots&amp;source=https%3a%2f%2fyanirseroussi.com%2f2016%2f03%2f20%2fthe-rise-of-greedy-robots%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rise of greedy robots on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2016%2f03%2f20%2fthe-rise-of-greedy-robots%2f&title=The%20rise%20of%20greedy%20robots"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rise of greedy robots on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2016%2f03%2f20%2fthe-rise-of-greedy-robots%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rise of greedy robots on whatsapp" href="https://api.whatsapp.com/send?text=The%20rise%20of%20greedy%20robots%20-%20https%3a%2f%2fyanirseroussi.com%2f2016%2f03%2f20%2fthe-rise-of-greedy-robots%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rise of greedy robots on telegram" href="https://telegram.me/share/url?text=The%20rise%20of%20greedy%20robots&amp;url=https%3a%2f%2fyanirseroussi.com%2f2016%2f03%2f20%2fthe-rise-of-greedy-robots%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rise of greedy robots on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20rise%20of%20greedy%20robots&u=https%3a%2f%2fyanirseroussi.com%2f2016%2f03%2f20%2fthe-rise-of-greedy-robots%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="data science,deep learning,economics,futurism,machine intelligence"><meta name=description content="Is artificial/machine intelligence a future threat? I argue that it&rsquo;s already here, with greedy robots already dominating our lives."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The rise of greedy robots"><meta property="og:description" content="Is artificial/machine intelligence a future threat? I argue that it&rsquo;s already here, with greedy robots already dominating our lives."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/"><meta property="og:image" content="https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2016-03-20T20:33:43+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot.jpg"><meta name=twitter:title content="The rise of greedy robots"><meta name=twitter:description content="Is artificial/machine intelligence a future threat? I argue that it&rsquo;s already here, with greedy robots already dominating our lives."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"The rise of greedy robots","item":"https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The rise of greedy robots","name":"The rise of greedy robots","description":"Is artificial/machine intelligence a future threat? I argue that it\u0026rsquo;s already here, with greedy robots already dominating our lives.","keywords":["data science","deep learning","economics","futurism","machine intelligence"],"articleBody":"Given the impressive advancement of machine intelligence in recent years, many people have been speculating on what the future holds when it comes to the power and roles of robots in our society. Some have even called for regulation of machine intelligence before it’s too late. My take on this issue is that there is no need to speculate – machine intelligence is already here, with greedy robots already dominating our lives.\nMachine intelligence or artificial intelligence? The problem with talking about artificial intelligence is that it creates an inflated expectation of machines that would be completely human-like – we won’t have true artificial intelligence until we can create machines that are indistinguishable from humans. While the goal of mimicking human intelligence is certainly interesting, it is clear that we are very far from achieving it. We currently can’t even fully simulate C. elegans, a 1mm worm with 302 neurons. However, we do have machines that can perform tasks that require intelligence, where intelligence is defined as the ability to learn or understand things or to deal with new or difficult situations. Unlike artificial intelligence, there is no doubt that machine intelligence already exists.\nAirplanes provide a famous example: we don’t commonly think of them as performing artificial flight – they are machines that fly faster than any bird. Likewise, computers are super-intelligent machines. They can perform calculations that humans can’t, store and recall enormous amounts of information, translate text, play Go, drive cars, and much more – all without requiring rest or food. The robots are here, and they are becoming increasingly useful and powerful.\nWho are those greedy robots? Greed is defined as a selfish desire to have more of something (especially money). It is generally seen as a negative trait in humans. However, we have been cultivating an environment where greedy entities – for-profit organisations – thrive. The primary goal of for-profit organisations is to generate profit for their shareholders. If these organisations were human, they would be seen as the embodiment of greed, as they are focused on making money and little else. Greedy organisations “live” among us and have been enjoying a plethora of legal rights and protections for hundreds of years. These entities, which were formed and shaped by humans, now form and shape human lives.\nHumans running for-profit organisations have little choice but to play by their rules. For example, many people acknowledge that corporate tax avoidance is morally wrong, as revenue from taxes supports the infrastructure and society that enable corporate profits. However, any executive of a public company who refuses to do everything they legally can to minimise their tax bill is likely to lose their job. Despite being separate from the greedy organisations we run, humans have to act greedily to effectively serve their employers.\nThe relationship between greedy organisations and greedy robots is clear. Much of the funding that goes into machine intelligence research comes from for-profit organisations, with the end goal of producing profit for these entities. In the words of Jeffrey Hammerbacher: The best minds of my generation are thinking about how to make people click ads. Hammerbacher, an early Facebook employee, was referring to Facebook’s business model, where considerable resources are dedicated to getting people to engage with advertising – the main driver of Facebook’s revenue. Indeed, Facebook has hired Yann LeCun (a prominent machine intelligence researcher) to head its artificial intelligence research efforts. While LeCun’s appointment will undoubtedly result in general research advancements, Facebook’s motivation is clear – they see machine intelligence as a key driver of future profits. They, and other companies, use machine intelligence to build greedy robots, whose sole goal is to increase profits.\nGreedy robots are all around us. Advertising-driven companies like Facebook and Google use sophisticated algorithms to get people to click on ads. Retail companies like Amazon use machine intelligence to mine through people’s shopping history and generate product recommendations. Banks and mutual funds utilise algorithmic trading to drive their investments. None of this is science fiction, and it doesn’t take much of a leap to imagine a world where greedy robots are even more dominant. Just like we have allowed greedy legal entities to dominate our world and shape our lives, we are allowing greedy robots to do the same, just more efficiently and pervasively.\nWill robots take your job? The growing range of machine intelligence capabilities gives rise to the question of whether robots are going to take over human jobs. One salient example is that of self-driving cars, that are projected to render millions of professional drivers obsolete in the next few decades. The potential impact of machine intelligence on jobs was summarised very well by CGP Grey in his video Humans Need Not Apply. The main message of the video is that machines will soon be able to perform any job better or more cost-effectively than any human, thereby making humans unemployable for economic reasons. The video ends with a call to society to consider how to deal with a future where there are simply no jobs for a large part of the population.\nDespite all the technological advancements since the start of the industrial revolution, the prevailing mode of wealth distribution remains paid labour, i.e., jobs. The implication of this is that much of the work we do is unnecessary or harmful – people work because they have no other option, but their work doesn’t necessarily benefit society. This isn’t a new insight, as the following quotes demonstrate:\n“Most men appear never to have considered what a house is, and are actually though needlessly poor all their lives because they think that they must have such a one as their neighbors have. […] For more than five years I maintained myself thus solely by the labor of my hands, and I found that, by working about six weeks in a year, I could meet all the expenses of living.” – Henry David Thoreau, Walden (1854) “I think that there is far too much work done in the world, that immense harm is caused by the belief that work is virtuous, and that what needs to be preached in modern industrial countries is quite different from what always has been preached. […] Modern technique has made it possible to diminish enormously the amount of labor required to secure the necessaries of life for everyone. […] If, at the end of the war, the scientific organization, which had been created in order to liberate men for fighting and munition work, had been preserved, and the hours of the week had been cut down to four, all would have been well. Instead of that the old chaos was restored, those whose work was demanded were made to work long hours, and the rest were left to starve as unemployed.” – Bertrand Russell, In Praise of Idleness (1932) “In the year 1930, John Maynard Keynes predicted that technology would have advanced sufficiently by century’s end that countries like Great Britain or the United States would achieve a 15-hour work week. There’s every reason to believe he was right. In technological terms, we are quite capable of this. And yet it didn’t happen. Instead, technology has been marshaled, if anything, to figure out ways to make us all work more. In order to achieve this, jobs have had to be created that are, effectively, pointless. Huge swathes of people, in Europe and North America in particular, spend their entire working lives performing tasks they secretly believe do not really need to be performed. The moral and spiritual damage that comes from this situation is profound. It is a scar across our collective soul. Yet virtually no one talks about it.” – David Graeber, On the Phenomenon of Bullshit Jobs (2013) This leads to the conclusion that we are unlikely to experience the utopian future in which intelligent machines do all our work, leaving us ample time for leisure. Yes, people will lose their jobs. But it is not unlikely that new unnecessary jobs will be invented to keep people busy, or worse, many people will simply be unemployed and will not get to enjoy the wealth provided by technology. Stephen Hawking summarised it well recently:\nIf machines produce everything we need, the outcome will depend on how things are distributed. Everyone can enjoy a life of luxurious leisure if the machine-produced wealth is shared, or most people can end up miserably poor if the machine-owners successfully lobby against wealth redistribution. So far, the trend seems to be toward the second option, with technology driving ever-increasing inequality.\nWhere to from here? Many people believe that the existence of powerful greedy entities is good for society. Indeed, there is no doubt that we owe many beneficial technological breakthroughs to competition between for-profit companies. However, a single-minded focus on profit means that in many cases companies do what they can to reduce their responsibility for harmful side-effects of their activities. Examples include environmental pollution, multinational tax evasion, and health effects of products like tobacco and junk food. As history shows us, in truly unregulated markets, companies would happily utilise slavery and child labour to reduce their costs. Clearly, some regulation of greedy entities is required to obtain the best results for society.\nWith machine intelligence becoming increasingly powerful every day, some people think that to produce the best outcomes, we just need to wait for robots to be intelligent enough to completely run our lives. However, as anyone who has actually built intelligent systems knows, the outputs of such systems are strongly dependent on the inputs and goals set by system designers. Machine intelligence is just a tool – a very powerful tool. Like nuclear energy, we can use it to improve our lives, or we can use it to obliterate everything around us. The collective choice is ours to make, but is far from simple.\n","wordCount":"1644","inLanguage":"en","image":"https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot.jpg","datePublished":"2016-03-20T20:33:43Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">The rise of greedy robots</h1><div class=post-meta><span title='2016-03-20 20:33:43 +0000 UTC'>March 20, 2016</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot_hu400343414979e1c2dc8bafadfe0b6d4d_563587_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot_hu400343414979e1c2dc8bafadfe0b6d4d_563587_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot_hu400343414979e1c2dc8bafadfe0b6d4d_563587_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot_hu400343414979e1c2dc8bafadfe0b6d4d_563587_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot_hu400343414979e1c2dc8bafadfe0b6d4d_563587_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot.jpg 1920w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/greedy-robot.jpg alt width=1920 height=1064></figure><div class=post-content><p>Given the impressive advancement of machine intelligence in recent years, many people have been speculating on what the future holds when it comes to the power and roles of robots in our society. Some have even <a href=http://www.theguardian.com/technology/2014/oct/27/elon-musk-artificial-intelligence-ai-biggest-existential-threat target=_blank rel=noopener>called for regulation of machine intelligence before it&rsquo;s too late</a>. My take on this issue is that there is no need to speculate – machine intelligence is already here, with greedy robots already dominating our lives.</p><h2 id=machine-intelligence-or-artificial-intelligence>Machine intelligence or artificial intelligence?<a hidden class=anchor aria-hidden=true href=#machine-intelligence-or-artificial-intelligence>#</a></h2><p>The problem with talking about <em>artificial</em> intelligence is that it creates an inflated expectation of machines that would be completely human-like – we won&rsquo;t have true artificial intelligence until we can create machines that are indistinguishable from humans. While the goal of mimicking human intelligence is certainly interesting, it is clear that we are very far from achieving it. We currently <a href=http://www.openworm.org/ target=_blank rel=noopener>can&rsquo;t even fully simulate C. elegans, a 1mm worm with 302 neurons</a>. However, we do have machines that can perform tasks that require intelligence, where intelligence is defined as <a href=http://www.merriam-webster.com/dictionary/intelligence target=_blank rel=noopener>the ability to learn or understand things or to deal with new or difficult situations</a>. Unlike artificial intelligence, there is no doubt that <em>machine</em> intelligence already exists.</p><p>Airplanes provide a famous example: we don&rsquo;t commonly think of them as performing artificial flight – they are machines that fly faster than any bird. Likewise, computers are super-intelligent machines. They can perform calculations that humans can&rsquo;t, store and recall enormous amounts of information, translate text, play Go, drive cars, and much more – all without requiring rest or food. The robots are here, and they are becoming increasingly useful and powerful.</p><h2 id=who-are-those-greedy-robots>Who are those greedy robots?<a hidden class=anchor aria-hidden=true href=#who-are-those-greedy-robots>#</a></h2><p>Greed is defined as <a href=http://www.merriam-webster.com/dictionary/greed target=_blank rel=noopener>a selfish desire to have more of something (especially money)</a>. It is generally seen as a negative trait in humans. However, we have been cultivating an environment where greedy entities – for-profit organisations – thrive. The primary goal of for-profit organisations is to generate profit for their shareholders. If these organisations were human, they would be seen as the embodiment of greed, as they are focused on making money and little else. Greedy organisations &ldquo;live&rdquo; among us and have been enjoying a plethora of legal rights and protections for hundreds of years. These entities, which were formed and shaped by humans, now form and shape human lives.</p><p>Humans running for-profit organisations have little choice but to play by their rules. For example, many people acknowledge that corporate tax avoidance is morally wrong, as revenue from taxes supports the infrastructure and society that enable corporate profits. However, any executive of a public company who refuses to do everything they legally can to minimise their tax bill is likely to lose their job. Despite being separate from the greedy organisations we run, humans have to act greedily to effectively serve their employers.</p><p>The relationship between greedy organisations and greedy robots is clear. Much of the funding that goes into machine intelligence research comes from for-profit organisations, with the end goal of producing profit for these entities. In the <a href=http://www.fastcompany.com/3008436/takeaway/why-data-god-jeffrey-hammerbacher-left-facebook-found-cloudera target=_blank rel=noopener>words of Jeffrey Hammerbacher</a>: <em>The best minds of my generation are thinking about how to make people click ads.</em> Hammerbacher, an early Facebook employee, was referring to Facebook&rsquo;s business model, where considerable resources are dedicated to getting people to engage with advertising – the main driver of Facebook&rsquo;s revenue. Indeed, Facebook has hired <a href=https://en.wikipedia.org/wiki/Yann_LeCun target=_blank rel=noopener>Yann LeCun</a> (a prominent machine intelligence researcher) to head its artificial intelligence research efforts. While LeCun&rsquo;s appointment will undoubtedly result in general research advancements, Facebook&rsquo;s motivation is clear – they see machine intelligence as a key driver of future profits. They, and other companies, use machine intelligence to build greedy robots, whose sole goal is to increase profits.</p><p>Greedy robots are all around us. Advertising-driven companies like Facebook and Google use sophisticated algorithms to get people to click on ads. Retail companies like Amazon use machine intelligence to mine through people&rsquo;s shopping history and generate product recommendations. Banks and mutual funds utilise algorithmic trading to drive their investments. None of this is science fiction, and it doesn&rsquo;t take much of a leap to imagine a world where greedy robots are even more dominant. Just like we have allowed greedy legal entities to dominate our world and shape our lives, we are allowing greedy robots to do the same, just more efficiently and pervasively.</p><h2 id=will-robots-take-your-job>Will robots take your job?<a hidden class=anchor aria-hidden=true href=#will-robots-take-your-job>#</a></h2><p>The growing range of machine intelligence capabilities gives rise to the question of whether robots are going to take over human jobs. One salient example is that of self-driving cars, that are projected to render millions of professional drivers obsolete in the next few decades. The potential impact of machine intelligence on jobs was summarised very well by CGP Grey in his video <a href="https://www.youtube.com/watch?v=7Pq-S557XQU" target=_blank rel=noopener>Humans Need Not Apply</a>. The main message of the video is that machines will soon be able to perform any job better or more cost-effectively than any human, thereby making humans unemployable for economic reasons. The video ends with a call to society to consider how to deal with a future where there are simply no jobs for a large part of the population.</p><p>Despite all the technological advancements since the start of the industrial revolution, the prevailing mode of wealth distribution remains paid labour, i.e., jobs. The implication of this is that much of the work we do is unnecessary or harmful – people work because they have no other option, but their work doesn&rsquo;t necessarily benefit society. This isn&rsquo;t a new insight, as the following quotes demonstrate:</p><ul><li><em>&ldquo;Most men appear never to have considered what a house is, and are actually though needlessly poor all their lives because they think that they must have such a one as their neighbors have. [&mldr;] For more than five years I maintained myself thus solely by the labor of my hands, and I found that, by working about six weeks in a year, I could meet all the expenses of living.&rdquo;</em> – Henry David Thoreau, <a href=http://www.gutenberg.org/files/205/205-h/205-h.htm target=_blank rel=noopener>Walden</a> (<strong>1854</strong>)</li><li><em>&ldquo;I think that there is far too much work done in the world, that immense harm is caused by the belief that work is virtuous, and that what needs to be preached in modern industrial countries is quite different from what always has been preached. [&mldr;] Modern technique has made it possible to diminish enormously the amount of labor required to secure the necessaries of life for everyone. [&mldr;] If, at the end of the war, the scientific organization, which had been created in order to liberate men for fighting and munition work, had been preserved, and the hours of the week had been cut down to four, all would have been well. Instead of that the old chaos was restored, those whose work was demanded were made to work long hours, and the rest were left to starve as unemployed.&rdquo;</em> – Bertrand Russell, <a href=http://www.zpub.com/notes/idle.html target=_blank rel=noopener>In Praise of Idleness</a> (<strong>1932</strong>)</li><li><em>&ldquo;In the year 1930, John Maynard Keynes predicted that technology would have advanced sufficiently by century&rsquo;s end that countries like Great Britain or the United States would achieve a 15-hour work week. There&rsquo;s every reason to believe he was right. In technological terms, we are quite capable of this. And yet it didn’t happen. Instead, technology has been marshaled, if anything, to figure out ways to make us all work more. In order to achieve this, jobs have had to be created that are, effectively, pointless. Huge swathes of people, in Europe and North America in particular, spend their entire working lives performing tasks they secretly believe do not really need to be performed. The moral and spiritual damage that comes from this situation is profound. It is a scar across our collective soul. Yet virtually no one talks about it.&rdquo;</em> – David Graeber, <a href=http://strikemag.org/bullshit-jobs/ target=_blank rel=noopener>On the Phenomenon of Bullshit Jobs</a> (<strong>2013</strong>)</li></ul><p>This leads to the conclusion that we are unlikely to experience the utopian future in which intelligent machines do all our work, leaving us ample time for leisure. Yes, people will lose their jobs. But it is not unlikely that new unnecessary jobs will be invented to keep people busy, or worse, many people will simply be unemployed and will not get to enjoy the wealth provided by technology. Stephen Hawking <a href=https://www.reddit.com/r/science/comments/3nyn5i/science_ama_series_stephen_hawking_ama_answers/cvsdmkv target=_blank rel=noopener>summarised it well recently</a>:</p><blockquote><p>If machines produce everything we need, the outcome will depend on how things are distributed. Everyone can enjoy a life of luxurious leisure if the machine-produced wealth is shared, or most people can end up miserably poor if the machine-owners successfully lobby against wealth redistribution. So far, the trend seems to be toward the second option, with technology driving ever-increasing inequality.</p></blockquote><h2 id=where-to-from-here>Where to from here?<a hidden class=anchor aria-hidden=true href=#where-to-from-here>#</a></h2><p>Many people believe that the existence of powerful greedy entities is good for society. Indeed, there is no doubt that we owe many beneficial technological breakthroughs to competition between for-profit companies. However, a single-minded focus on profit means that in many cases companies do what they can to reduce their responsibility for harmful side-effects of their activities. Examples include environmental pollution, multinational tax evasion, and health effects of products like tobacco and junk food. As history shows us, in truly unregulated markets, companies would happily utilise slavery and child labour to reduce their costs. Clearly, some regulation of greedy entities is required to obtain the best results for society.</p><p>With machine intelligence becoming increasingly powerful every day, some people think that to produce the best outcomes, we just need to wait for robots to be intelligent enough to completely run our lives. However, as anyone who has actually built intelligent systems knows, the outputs of such systems are strongly dependent on the inputs and goals set by system designers. Machine intelligence is just a tool – a very powerful tool. Like nuclear energy, we can use it to improve our lives, or we can use it to obliterate everything around us. The collective choice is ours to make, but is far from simple.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/deep-learning/>Deep Learning</a></li><li><a href=https://yanirseroussi.com/tags/economics/>Economics</a></li><li><a href=https://yanirseroussi.com/tags/futurism/>Futurism</a></li><li><a href=https://yanirseroussi.com/tags/machine-intelligence/>Machine Intelligence</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The rise of greedy robots on x" href="https://x.com/intent/tweet/?text=The%20rise%20of%20greedy%20robots&amp;url=https%3a%2f%2fyanirseroussi.com%2f2016%2f03%2f20%2fthe-rise-of-greedy-robots%2f&amp;hashtags=datascience%2cdeeplearning%2ceconomics%2cfuturism%2cmachineintelligence"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rise of greedy robots on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2016%2f03%2f20%2fthe-rise-of-greedy-robots%2f&amp;title=The%20rise%20of%20greedy%20robots&amp;summary=The%20rise%20of%20greedy%20robots&amp;source=https%3a%2f%2fyanirseroussi.com%2f2016%2f03%2f20%2fthe-rise-of-greedy-robots%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rise of greedy robots on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2016%2f03%2f20%2fthe-rise-of-greedy-robots%2f&title=The%20rise%20of%20greedy%20robots"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rise of greedy robots on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2016%2f03%2f20%2fthe-rise-of-greedy-robots%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rise of greedy robots on whatsapp" href="https://api.whatsapp.com/send?text=The%20rise%20of%20greedy%20robots%20-%20https%3a%2f%2fyanirseroussi.com%2f2016%2f03%2f20%2fthe-rise-of-greedy-robots%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rise of greedy robots on telegram" href="https://telegram.me/share/url?text=The%20rise%20of%20greedy%20robots&amp;url=https%3a%2f%2fyanirseroussi.com%2f2016%2f03%2f20%2fthe-rise-of-greedy-robots%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rise of greedy robots on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20rise%20of%20greedy%20robots&u=https%3a%2f%2fyanirseroussi.com%2f2016%2f03%2f20%2fthe-rise-of-greedy-robots%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/index.html b/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/index.html
index 2e509ad6c..5b14335e0 100644
--- a/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/index.html
+++ b/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="causal inference,data science,insights,predictive modelling"><meta name=description content="Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg&rsquo;s Causality, Probability, and Time."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions"><meta property="og:description" content="Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg&rsquo;s Causality, Probability, and Time."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/"><meta property="og:image" content="https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2016-05-14T19:57:03+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving.jpg"><meta name=twitter:title content="Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions"><meta name=twitter:description content="Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg&rsquo;s Causality, Probability, and Time."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions","item":"https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions","name":"Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions","description":"Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg\u0026rsquo;s Causality, Probability, and Time.","keywords":["causal inference","data science","insights","predictive modelling"],"articleBody":"Background: I have previously written about the need for real insights that address the why behind events, not only the what and how. This was followed by a fairly popular post on causality, which was heavily influenced by Samantha Kleinberg's book Why: A Guide to Finding and Using Causes. This post continues my exploration of the field, and is primarily based on Kleinberg's previous book: Causality, Probability, and Time.\nThe study of causality and causal inference is central to science in general and data science in particular. Being able to distinguish between correlation and causation is key to designing effective interventions in business, public policy, medicine, and many other fields. There are quite a few approaches to inferring causal relationships from data. In this post, I discuss some aspects of Judea Pearl’s graphical modelling approach, and how its limitations are addressed in recent work by Samantha Kleinberg. I then finish with a brief survey of the Bradford Hill criteria and their applicability to a key limitation of all causal inference methods: The need for untested assumptions.\nJudea Pearl Overcoming my Pearl bias First, I must disclose that I have a personal bias in favour of Pearl’s work. While I’ve never met him, Pearl is my academic grandfather – he was the PhD advisor of my main PhD supervisor (Ingrid Zukerman). My first serious exposure to his work was through a Sydney reading group, where we discussed parts of Pearl’s approach to causal inference. Recently, I refreshed my knowledge of Pearl causality by reading Causal inference in statistics: An overview. I am by no means an expert in Pearl’s huge body of work, but I think I understand enough of it to write something of use.\nPearl’s theory of causality employs Bayesian networks to represent causal structures. These are directed acyclic graphs, where each vertex represents a variable, and an edge from X to Y implies that X causes Y. Pearl also introduces the do(X) operator, which simulates interventions by removing all the causes of X, setting it to a constant. There is much more to this theory, but two of its main contributions are the formalisation of causal concepts that are often given only a verbal treatment, and the explicit encoding of causal assumptions. These assumptions must be made by the modeller based on background knowledge, and are encoded in the graph’s structure – a missing edge between two vertices indicates that there is no direct causal relationship between the two variables.\nMy main issue with Pearl’s treatment of causality is that he doesn’t explicitly handle time. While time can be encoded into Pearl’s models (e.g., via dynamic Bayesian networks), there is nothing that prevents creation of models where the future causes changes in the past. A closely-related issue is that Pearl’s causal models must be directed acyclic graphs, making it hard to model feedback loops. For example, Pearl says that “mud does not cause rain”, but this isn’t true – water from mud evaporates, causing rain (which causes mud). What’s true is that “mud now doesn’t cause rain now” or something along these lines, which is something that must be accounted for by adding temporal information to the models.\nNonetheless, Pearl’s theory is an important step forward in the study of causality. In his words, “in the bulk of the statistical literature before 2000, causal claims rarely appear in the mathematics. They surface only in the verbal interpretation that investigators occasionally attach to certain associations, and in the verbal description with which investigators justify assumptions.” The importance of formal causal analysis cannot be overstated, as it underlies many decisions that affect our lives. However, it seems to me like there’s still plenty of work to be done before causal analysis becomes as established as other statistical tools.\nSamantha Kleinberg Kleinberg: Addressing gaps in Pearl’s work I recently finished reading Samantha Kleinberg’s Causality, Probability, and Time. Kleinberg dedicates a good portion of the book to presenting the history of causality and discussing its many definitions. As hinted by the book’s title, Kleinberg believes that one cannot discuss causality without considering time. In her words: “One of the most critical pieces of information about causality, though – the time it takes for the cause to produce its effect – has been largely ignored by both philosophical theories and computational methods. If we do not know when the effect will occur, we have little hope of being able to act successfully using the causal relationship.” Following this assertion, Kleinberg presents a new approach to causal inference that is based on probabilistic computation tree logic (PCTL). With PCTL, one can concisely express probabilistic temporal statements. For example, if we observe a potential cause c occurring at time t, and a possible effect e occurring at time t’, we can use PCTL to state the hypothesis that in general, after c becomes true, it takes between one and |t’ – t| time units for e to become true with probability at least p, i.e., c leads to e:\nIt is obvious why PCTL may be a better fit than Bayesian networks for expressing causal statements. For example, with a Bayesian network, we can easily express the statement that smoking causes lung cancer with probability 0.3, but this isn’t that useful, as it doesn’t tell us how long it’ll take for cancer to develop. With PCTL, we can state that smoking causes lung cancer in 5-30 years with probability at least 0.3. This matches our knowledge that cancer doesn’t develop immediately – one cigarette won’t kill you.\nOne of the key concepts introduced by Kleinberg is that of causal significance. Calculating the causal significance of a cause c to an effect e relies on first identifying the set X of potential (or prima facie) causes of e. The set X contains all discrete variables x such that E[e|x]≠E[e] and x occurs earlier than e. Given the set X, the causal significance of c to e is the mean of E[e|c∧x] – E[e|¬c∧x] for all x≠c. The intuition is that if a cause c is significant, its causal significance value will be high when other potential causes are held fixed. For example, if c is heavy smoking and e is severity of lung cancer (with e=0 meaning no cancer), the expected value of e given c is likely to be higher than the expected value of e given ¬c, when conditioned on any other potential cause. Once causal significance has been measured, we can separate significant causes from insignificant causes by setting a threshold on causal significance values (this threshold can be inferred from the data). Significant causes are considered to be genuine if the data is stationary and the common causes of all pairs of variables have been included, which is a very strong condition that may be hard to fulfil in realistic scenarios. However, causal significance is an evolving concept – last year, Huang and Kleinberg introduced a new definition of causal significance that can be inferred faster and yield more accurate results. My general feeling is that this line of research will continue to yield many interesting and useful results in coming years.\nKleinberg’s work is not without its limitations. In addition to the assumptions that causal relationships are stationary and the requirement to identify all potential causes, the recently-introduced definition of causal significance also requires the relationships to be linear and additive (though this limitation may be relaxed in future work). Another issue is that most of the evaluation in the studies I’ve read was done on synthetic datasets. While there are some results on real-life health and finance data, I find it hard to judge the practicality of utilising Kleinberg’s methods without applying them to problems that I’m more familiar with. Finally, as with other work in the field of causal inference, we need to have some degree of belief in untested assumptions to reach useful conclusions. In Kleinberg’s words:\nThus, a just so cause is genuine in the case where all of the outlined assumptions hold (namely that all common causes are included, the structure is representative of the system and, when data is used, a formula satisfied by the data will be satisfied by the structure). Our belief in whether a cause is genuine, in the case where it is not certain that the assumptions hold, should be proportional to how much we believe that the assumptions are true.\nAustin Bradford Hill Hill: Testing untested assumptions To the best of my knowledge, all causal inference methods rely on untested assumptions. Specifically, we can never include all the variables in the universe in our models. Therefore, any conclusions drawn are reliant on deciding what, when, and how to measure potential causes and effects. Another issue is that no matter how good and believable our modelling is, we cannot use causal inference to convince unreasonable people. For example, some people may cite divine intervention as an unmeasurable cause of anything and everything. In addition, people with certain commercial interests often try to raise doubt about well-established causal mechanisms by making unreasonable claims for evidence of various hidden factors. For example, tobacco companies used to claim that both smoking and lung cancer were caused by a common hidden factor, making the link between smoking and lung cancer a mere association.\nAssuming that we are dealing with reasonable people, there’s still the question of where we should get our untested assumptions from. This question is fairly old, and has been partly answered in 1965 by Austin Bradford Hill, with nine criteria that he recommended should be considered before calling an association causal:\nStrength: How strong is the association? For example, lung cancer deaths of heavy smokers are 20-30 times greater than those of non-smokers. Consistency: Has the association been repeatedly observed in various circumstances? For example, many different populations have exhibited an association between smoking rates and cancer. Specificity: Can we pin down specific instances of the effect to specific instances of the cause? Hill sees this as a nice-to-have condition rather than a must-have – cases with multiple possible causes may not fulfil the specificity requirement. Temporality: Do we know that c leads to e or are we observing them together? This is a condition that isn’t always easy to fulfil, especially when dealing with feedback loops and slow processes. Biological gradient: Hill’s focus was on medicine, and this condition refers to the association exhibiting some dose-response curve. This can be generalised to other fields, as we can expect some regularity in the effect if it is a function of the cause (though it doesn’t have to be a linear function). Plausibility: Do we know of a mechanism that can explain how the cause brings about the effect? Coherence: Does the association conflict with our current knowledge? Even if it does, it isn’t enough to rule out causality, as our current knowledge may be incomplete or wrong. Experiment: If possible, running controlled experiments may yield very powerful evidence in favour of causation. Analogy: Do we know of any similar cause-and-effect relationships? Hill summarises the list of criteria (or viewpoints) with the following statements.\nHere then are nine different viewpoints from all of which we should study association before we cry causation. What I do not believe – and this has been suggested – is that we can usefully lay down some hard-and-fast rules of evidence that must be obeyed before we accept cause and effect. None of my nine viewpoints can bring indisputable evidence for or against the cause-and-effect hypothesis and none can be required as a sine qua non. What they can do, with greater or less strength, is to help us to make up our minds on the fundamental question – is there any other way of explaining the set of facts before us, is there any other answer equally, or more, likely than cause and effect?\nNo formal tests of significance can answer those questions. Such tests can, and should, remind us of the effects that the play of chance can create, and they will instruct us in the likely magnitude of those effects. Beyond that they contribute nothing to the ‘proof’ of our hypothesis.\nHill then goes on to criticise the increased focus on statistical significance as a condition for accepting scientific papers for publication. Remembering that this was over 50 years ago, it is a bit worrying that it has taken so long for the statistical community to formally acknowledge the fact that statistical significance does not imply scientific importance, or constitutes enough evidence to support a causal hypothesis.\nClosing thoughts This post has only scratched the surface of the vast field of study of causality. At this point, I feel like I’ve read quite a bit, and it is time to apply what I learned to real problems. I encounter questions of causality in my everyday work, but haven’t fully applied formal causal inference to any problem yet. My view is that everyone needs to at least be aware of the need to consider causality, and of what it’d take to truly prove causal impact. A large proportion of what many people need in practice may be addressed by Hill’s criteria, rather than by formal methods for causal analysis. Nonetheless, I will report back when I get a chance to apply formal causal inference to real datasets. Stay tuned!\n","wordCount":"2223","inLanguage":"en","image":"https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving.jpg","datePublished":"2016-05-14T19:57:03Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions</h1><div class=post-meta><span title='2016-05-14 19:57:03 +0000 UTC'>May 14, 2016</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving_hub5d72e3c45cdff9da93ca2e12cce16a2_673766_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving_hub5d72e3c45cdff9da93ca2e12cce16a2_673766_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving_hub5d72e3c45cdff9da93ca2e12cce16a2_673766_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving_hub5d72e3c45cdff9da93ca2e12cce16a2_673766_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving_hub5d72e3c45cdff9da93ca2e12cce16a2_673766_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving.jpg 1920w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving.jpg alt width=1920 height=672></figure><div class=post-content><p class=intro-note>Background: I have previously written about <a href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/>the need for real insights that address the why behind events, not only the what and how</a>. This was followed by a <a href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/>fairly popular post on causality</a>, which was heavily influenced by Samantha Kleinberg's book <a href=http://www.skleinberg.org/why/ target=_blank rel=noopener>Why: A Guide to Finding and Using Causes</a>. This post continues my exploration of the field, and is primarily based on Kleinberg's previous book: <a href=http://www.skleinberg.org/causality_book/index.html target=_blank rel=noopener>Causality, Probability, and Time</a>.</p><p>The study of causality and causal inference is central to science in general and data science in particular. Being able to distinguish between correlation and causation is key to designing effective interventions in business, public policy, medicine, and many other fields. There are quite a few approaches to inferring causal relationships from data. In this post, I discuss some aspects of <a href=https://en.wikipedia.org/wiki/Judea_Pearl target=_blank rel=noopener>Judea Pearl&rsquo;s</a> graphical modelling approach, and how its limitations are addressed in recent work by <a href=http://www.skleinberg.org/ target=_blank rel=noopener>Samantha Kleinberg</a>. I then finish with a brief survey of the <a href=https://en.wikipedia.org/wiki/Bradford_Hill_criteria target=_blank rel=noopener>Bradford Hill criteria</a> and their applicability to a key limitation of all causal inference methods: The need for untested assumptions.</p><h2 id=hahahugoshortcode42s0hbhb-overcoming-my-pearl-bias><figure class=float-right><a href=judea-pearl.jpg target=_blank rel=noopener><img sizes="(min-width: 768px) 435px,
+<meta name=keywords content="causal inference,data science,insights,predictive modelling"><meta name=description content="Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg&rsquo;s Causality, Probability, and Time."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions"><meta property="og:description" content="Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg&rsquo;s Causality, Probability, and Time."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/"><meta property="og:image" content="https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2016-05-14T19:57:03+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving.jpg"><meta name=twitter:title content="Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions"><meta name=twitter:description content="Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg&rsquo;s Causality, Probability, and Time."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions","item":"https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions","name":"Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions","description":"Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg\u0026rsquo;s Causality, Probability, and Time.","keywords":["causal inference","data science","insights","predictive modelling"],"articleBody":"Background: I have previously written about the need for real insights that address the why behind events, not only the what and how. This was followed by a fairly popular post on causality, which was heavily influenced by Samantha Kleinberg's book Why: A Guide to Finding and Using Causes. This post continues my exploration of the field, and is primarily based on Kleinberg's previous book: Causality, Probability, and Time.\nThe study of causality and causal inference is central to science in general and data science in particular. Being able to distinguish between correlation and causation is key to designing effective interventions in business, public policy, medicine, and many other fields. There are quite a few approaches to inferring causal relationships from data. In this post, I discuss some aspects of Judea Pearl’s graphical modelling approach, and how its limitations are addressed in recent work by Samantha Kleinberg. I then finish with a brief survey of the Bradford Hill criteria and their applicability to a key limitation of all causal inference methods: The need for untested assumptions.\nJudea Pearl Overcoming my Pearl bias First, I must disclose that I have a personal bias in favour of Pearl’s work. While I’ve never met him, Pearl is my academic grandfather – he was the PhD advisor of my main PhD supervisor (Ingrid Zukerman). My first serious exposure to his work was through a Sydney reading group, where we discussed parts of Pearl’s approach to causal inference. Recently, I refreshed my knowledge of Pearl causality by reading Causal inference in statistics: An overview. I am by no means an expert in Pearl’s huge body of work, but I think I understand enough of it to write something of use.\nPearl’s theory of causality employs Bayesian networks to represent causal structures. These are directed acyclic graphs, where each vertex represents a variable, and an edge from X to Y implies that X causes Y. Pearl also introduces the do(X) operator, which simulates interventions by removing all the causes of X, setting it to a constant. There is much more to this theory, but two of its main contributions are the formalisation of causal concepts that are often given only a verbal treatment, and the explicit encoding of causal assumptions. These assumptions must be made by the modeller based on background knowledge, and are encoded in the graph’s structure – a missing edge between two vertices indicates that there is no direct causal relationship between the two variables.\nMy main issue with Pearl’s treatment of causality is that he doesn’t explicitly handle time. While time can be encoded into Pearl’s models (e.g., via dynamic Bayesian networks), there is nothing that prevents creation of models where the future causes changes in the past. A closely-related issue is that Pearl’s causal models must be directed acyclic graphs, making it hard to model feedback loops. For example, Pearl says that “mud does not cause rain”, but this isn’t true – water from mud evaporates, causing rain (which causes mud). What’s true is that “mud now doesn’t cause rain now” or something along these lines, which is something that must be accounted for by adding temporal information to the models.\nNonetheless, Pearl’s theory is an important step forward in the study of causality. In his words, “in the bulk of the statistical literature before 2000, causal claims rarely appear in the mathematics. They surface only in the verbal interpretation that investigators occasionally attach to certain associations, and in the verbal description with which investigators justify assumptions.” The importance of formal causal analysis cannot be overstated, as it underlies many decisions that affect our lives. However, it seems to me like there’s still plenty of work to be done before causal analysis becomes as established as other statistical tools.\nSamantha Kleinberg Kleinberg: Addressing gaps in Pearl’s work I recently finished reading Samantha Kleinberg’s Causality, Probability, and Time. Kleinberg dedicates a good portion of the book to presenting the history of causality and discussing its many definitions. As hinted by the book’s title, Kleinberg believes that one cannot discuss causality without considering time. In her words: “One of the most critical pieces of information about causality, though – the time it takes for the cause to produce its effect – has been largely ignored by both philosophical theories and computational methods. If we do not know when the effect will occur, we have little hope of being able to act successfully using the causal relationship.” Following this assertion, Kleinberg presents a new approach to causal inference that is based on probabilistic computation tree logic (PCTL). With PCTL, one can concisely express probabilistic temporal statements. For example, if we observe a potential cause c occurring at time t, and a possible effect e occurring at time t’, we can use PCTL to state the hypothesis that in general, after c becomes true, it takes between one and |t’ – t| time units for e to become true with probability at least p, i.e., c leads to e:\nIt is obvious why PCTL may be a better fit than Bayesian networks for expressing causal statements. For example, with a Bayesian network, we can easily express the statement that smoking causes lung cancer with probability 0.3, but this isn’t that useful, as it doesn’t tell us how long it’ll take for cancer to develop. With PCTL, we can state that smoking causes lung cancer in 5-30 years with probability at least 0.3. This matches our knowledge that cancer doesn’t develop immediately – one cigarette won’t kill you.\nOne of the key concepts introduced by Kleinberg is that of causal significance. Calculating the causal significance of a cause c to an effect e relies on first identifying the set X of potential (or prima facie) causes of e. The set X contains all discrete variables x such that E[e|x]≠E[e] and x occurs earlier than e. Given the set X, the causal significance of c to e is the mean of E[e|c∧x] – E[e|¬c∧x] for all x≠c. The intuition is that if a cause c is significant, its causal significance value will be high when other potential causes are held fixed. For example, if c is heavy smoking and e is severity of lung cancer (with e=0 meaning no cancer), the expected value of e given c is likely to be higher than the expected value of e given ¬c, when conditioned on any other potential cause. Once causal significance has been measured, we can separate significant causes from insignificant causes by setting a threshold on causal significance values (this threshold can be inferred from the data). Significant causes are considered to be genuine if the data is stationary and the common causes of all pairs of variables have been included, which is a very strong condition that may be hard to fulfil in realistic scenarios. However, causal significance is an evolving concept – last year, Huang and Kleinberg introduced a new definition of causal significance that can be inferred faster and yield more accurate results. My general feeling is that this line of research will continue to yield many interesting and useful results in coming years.\nKleinberg’s work is not without its limitations. In addition to the assumptions that causal relationships are stationary and the requirement to identify all potential causes, the recently-introduced definition of causal significance also requires the relationships to be linear and additive (though this limitation may be relaxed in future work). Another issue is that most of the evaluation in the studies I’ve read was done on synthetic datasets. While there are some results on real-life health and finance data, I find it hard to judge the practicality of utilising Kleinberg’s methods without applying them to problems that I’m more familiar with. Finally, as with other work in the field of causal inference, we need to have some degree of belief in untested assumptions to reach useful conclusions. In Kleinberg’s words:\nThus, a just so cause is genuine in the case where all of the outlined assumptions hold (namely that all common causes are included, the structure is representative of the system and, when data is used, a formula satisfied by the data will be satisfied by the structure). Our belief in whether a cause is genuine, in the case where it is not certain that the assumptions hold, should be proportional to how much we believe that the assumptions are true.\nAustin Bradford Hill Hill: Testing untested assumptions To the best of my knowledge, all causal inference methods rely on untested assumptions. Specifically, we can never include all the variables in the universe in our models. Therefore, any conclusions drawn are reliant on deciding what, when, and how to measure potential causes and effects. Another issue is that no matter how good and believable our modelling is, we cannot use causal inference to convince unreasonable people. For example, some people may cite divine intervention as an unmeasurable cause of anything and everything. In addition, people with certain commercial interests often try to raise doubt about well-established causal mechanisms by making unreasonable claims for evidence of various hidden factors. For example, tobacco companies used to claim that both smoking and lung cancer were caused by a common hidden factor, making the link between smoking and lung cancer a mere association.\nAssuming that we are dealing with reasonable people, there’s still the question of where we should get our untested assumptions from. This question is fairly old, and has been partly answered in 1965 by Austin Bradford Hill, with nine criteria that he recommended should be considered before calling an association causal:\nStrength: How strong is the association? For example, lung cancer deaths of heavy smokers are 20-30 times greater than those of non-smokers. Consistency: Has the association been repeatedly observed in various circumstances? For example, many different populations have exhibited an association between smoking rates and cancer. Specificity: Can we pin down specific instances of the effect to specific instances of the cause? Hill sees this as a nice-to-have condition rather than a must-have – cases with multiple possible causes may not fulfil the specificity requirement. Temporality: Do we know that c leads to e or are we observing them together? This is a condition that isn’t always easy to fulfil, especially when dealing with feedback loops and slow processes. Biological gradient: Hill’s focus was on medicine, and this condition refers to the association exhibiting some dose-response curve. This can be generalised to other fields, as we can expect some regularity in the effect if it is a function of the cause (though it doesn’t have to be a linear function). Plausibility: Do we know of a mechanism that can explain how the cause brings about the effect? Coherence: Does the association conflict with our current knowledge? Even if it does, it isn’t enough to rule out causality, as our current knowledge may be incomplete or wrong. Experiment: If possible, running controlled experiments may yield very powerful evidence in favour of causation. Analogy: Do we know of any similar cause-and-effect relationships? Hill summarises the list of criteria (or viewpoints) with the following statements.\nHere then are nine different viewpoints from all of which we should study association before we cry causation. What I do not believe – and this has been suggested – is that we can usefully lay down some hard-and-fast rules of evidence that must be obeyed before we accept cause and effect. None of my nine viewpoints can bring indisputable evidence for or against the cause-and-effect hypothesis and none can be required as a sine qua non. What they can do, with greater or less strength, is to help us to make up our minds on the fundamental question – is there any other way of explaining the set of facts before us, is there any other answer equally, or more, likely than cause and effect?\nNo formal tests of significance can answer those questions. Such tests can, and should, remind us of the effects that the play of chance can create, and they will instruct us in the likely magnitude of those effects. Beyond that they contribute nothing to the ‘proof’ of our hypothesis.\nHill then goes on to criticise the increased focus on statistical significance as a condition for accepting scientific papers for publication. Remembering that this was over 50 years ago, it is a bit worrying that it has taken so long for the statistical community to formally acknowledge the fact that statistical significance does not imply scientific importance, or constitutes enough evidence to support a causal hypothesis.\nClosing thoughts This post has only scratched the surface of the vast field of study of causality. At this point, I feel like I’ve read quite a bit, and it is time to apply what I learned to real problems. I encounter questions of causality in my everyday work, but haven’t fully applied formal causal inference to any problem yet. My view is that everyone needs to at least be aware of the need to consider causality, and of what it’d take to truly prove causal impact. A large proportion of what many people need in practice may be addressed by Hill’s criteria, rather than by formal methods for causal analysis. Nonetheless, I will report back when I get a chance to apply formal causal inference to real datasets. Stay tuned!\n","wordCount":"2223","inLanguage":"en","image":"https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving.jpg","datePublished":"2016-05-14T19:57:03Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions</h1><div class=post-meta><span title='2016-05-14 19:57:03 +0000 UTC'>May 14, 2016</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving_hub5d72e3c45cdff9da93ca2e12cce16a2_673766_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving_hub5d72e3c45cdff9da93ca2e12cce16a2_673766_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving_hub5d72e3c45cdff9da93ca2e12cce16a2_673766_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving_hub5d72e3c45cdff9da93ca2e12cce16a2_673766_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving_hub5d72e3c45cdff9da93ca2e12cce16a2_673766_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving.jpg 1920w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/freediving.jpg alt width=1920 height=672></figure><div class=post-content><p class=intro-note>Background: I have previously written about <a href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/>the need for real insights that address the why behind events, not only the what and how</a>. This was followed by a <a href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/>fairly popular post on causality</a>, which was heavily influenced by Samantha Kleinberg's book <a href=http://www.skleinberg.org/why/ target=_blank rel=noopener>Why: A Guide to Finding and Using Causes</a>. This post continues my exploration of the field, and is primarily based on Kleinberg's previous book: <a href=http://www.skleinberg.org/causality_book/index.html target=_blank rel=noopener>Causality, Probability, and Time</a>.</p><p>The study of causality and causal inference is central to science in general and data science in particular. Being able to distinguish between correlation and causation is key to designing effective interventions in business, public policy, medicine, and many other fields. There are quite a few approaches to inferring causal relationships from data. In this post, I discuss some aspects of <a href=https://en.wikipedia.org/wiki/Judea_Pearl target=_blank rel=noopener>Judea Pearl&rsquo;s</a> graphical modelling approach, and how its limitations are addressed in recent work by <a href=http://www.skleinberg.org/ target=_blank rel=noopener>Samantha Kleinberg</a>. I then finish with a brief survey of the <a href=https://en.wikipedia.org/wiki/Bradford_Hill_criteria target=_blank rel=noopener>Bradford Hill criteria</a> and their applicability to a key limitation of all causal inference methods: The need for untested assumptions.</p><h2 id=hahahugoshortcode42s0hbhb-overcoming-my-pearl-bias><figure class=float-right><a href=judea-pearl.jpg target=_blank rel=noopener><img sizes="(min-width: 768px) 435px,
 100vw" srcset="https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/judea-pearl_hu9a8d9dce36ef378faf19e0843274044e_79154_360x0_resize_q75_box.jpg 360w,
 https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/judea-pearl.jpg 435w," src=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/judea-pearl.jpg alt="Judea Pearl" width=150 loading=lazy></a><figcaption><p>Judea Pearl</p></figcaption></figure>Overcoming my Pearl bias</h2><p>First, I must disclose that I have a personal bias in favour of Pearl&rsquo;s work. While I&rsquo;ve never met him, Pearl is my academic grandfather – he was the PhD advisor of my main PhD supervisor (Ingrid Zukerman). My first serious exposure to his work was through a Sydney reading group, where we discussed parts of Pearl&rsquo;s approach to causal inference. Recently, I refreshed my knowledge of Pearl causality by reading <a href=http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf target=_blank rel=noopener>Causal inference in statistics: An overview</a>. I am by no means an expert in Pearl&rsquo;s huge body of work, but I think I understand enough of it to write something of use.</p><p>Pearl&rsquo;s theory of causality employs Bayesian networks to represent causal structures. These are directed acyclic graphs, where each vertex represents a variable, and an edge from X to Y implies that X causes Y. Pearl also introduces the <code>do(X)</code> operator, which simulates interventions by removing all the causes of X, setting it to a constant. There is much more to this theory, but two of its main contributions are the formalisation of causal concepts that are often given only a verbal treatment, and the explicit encoding of causal assumptions. These assumptions must be made by the modeller based on background knowledge, and are encoded in the graph&rsquo;s structure – a missing edge between two vertices indicates that there is no direct causal relationship between the two variables.</p><p>My main issue with Pearl&rsquo;s treatment of causality is that he doesn&rsquo;t explicitly handle time. While time can be encoded into Pearl&rsquo;s models (e.g., via dynamic Bayesian networks), there is nothing that prevents creation of models where the future causes changes in the past. A closely-related issue is that Pearl&rsquo;s causal models must be directed <em>acyclic</em> graphs, making it hard to model feedback loops. For example, Pearl says that &ldquo;mud does not cause rain&rdquo;, but this isn&rsquo;t true – water from mud evaporates, causing rain (which causes mud). What&rsquo;s true is that &ldquo;mud now doesn&rsquo;t cause rain now&rdquo; or something along these lines, which is something that must be accounted for by adding temporal information to the models.</p><p>Nonetheless, Pearl&rsquo;s theory is an important step forward in the study of causality. In his words, &ldquo;<em>in the bulk of the statistical literature before 2000, causal claims rarely appear in the mathematics. They surface only in the verbal interpretation that investigators occasionally attach to certain associations, and in the verbal description with which investigators justify assumptions.</em>&rdquo; The importance of formal causal analysis cannot be overstated, as it underlies many decisions that affect our lives. However, it seems to me like there&rsquo;s still plenty of work to be done before causal analysis becomes as established as other statistical tools.</p><h2 id=hahahugoshortcode42s1hbhb-kleinberg-addressing-gaps-in-pearls-work><figure class=float-right><a href=samantha-kleinberg.jpg target=_blank rel=noopener><img sizes="(min-width: 768px) 586px,
 100vw" srcset="https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/samantha-kleinberg_hu3d03a01dcc18bc5be0e67db3d8d209a6_52017_360x0_resize_q75_box.jpg 360w,
diff --git a/2016/06/19/making-bayesian-ab-testing-more-accessible/index.html b/2016/06/19/making-bayesian-ab-testing-more-accessible/index.html
index 45ba5e427..6dc835041 100644
--- a/2016/06/19/making-bayesian-ab-testing-more-accessible/index.html
+++ b/2016/06/19/making-bayesian-ab-testing-more-accessible/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Making Bayesian A/B testing more accessible | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="analytics,causal inference,data science,split testing,statistics"><meta name=description content="A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Making Bayesian A/B testing more accessible"><meta property="og:description" content="A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/"><meta property="og:image" content="https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2016-06-19T10:32:15+00:00"><meta property="article:modified_time" content="2024-02-21T11:52:55+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator.png"><meta name=twitter:title content="Making Bayesian A/B testing more accessible"><meta name=twitter:description content="A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Making Bayesian A/B testing more accessible","item":"https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Making Bayesian A/B testing more accessible","name":"Making Bayesian A\/B testing more accessible","description":"A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.","keywords":["analytics","causal inference","data science","split testing","statistics"],"articleBody":"Much has been written in recent years on the pitfalls of using traditional hypothesis testing with online A/B tests. A key issue is that you’re likely to end up with many false positives if you repeatedly check your results and stop as soon as you reach statistical significance. One way of dealing with this issue is by following a Bayesian approach to deciding when the experiment should be stopped. While I find the Bayesian view of statistics much more intuitive than the frequentist view, it can be quite challenging to explain Bayesian concepts to laypeople. Hence, I decided to build a new Bayesian A/B testing calculator, which aims to make these concepts clear to any user. This post discusses the general problem and existing solutions, followed by a review of the new tool and how it can be improved further.\nThe problem The classic A/B testing problem is as follows. Suppose we run an experiment where we have a control group and a test group. Participants (typically website visitors) are allocated to groups randomly, and each group is presented with a different variant of the website or page (e.g., variant A is assigned to the control group and variant B is assigned to the test group). Our aim is to increase the overall number of binary successes, where success can be defined as clicking a button or opening a new account. Hence, we track the number of trials in each group together with the number of successes. For a given group, the number of successes divided by number of trials is the group’s raw success rate.\nGiven the results of an experiment (trials and successes for each group), there are a few questions we would typically like to answer:\nShould we choose variant A or variant B to maximise our success rate? How much would our success rate change if we chose one variant over the other? Do we have enough data or should we keep experimenting? It’s important to note some points that might be obvious, but are often overlooked. First, we run an experiment because we assume that it will help us uncover a causal link, where something about A or B is hypothesised to cause people to behave differently, thereby affecting the overall success rate. Second, we want to make a decision and choose either A or B, rather than maintain multiple variants and present the best variant depending on a participant’s features (a problem that’s addressed by contextual bandits, for example). Third, online A/B testing is different from traditional experiments in a lab, because we often have little control over the characteristics of our participants, and when, where, and how they choose to interact with our experiment. This is an important point, because it means that we may need to wait a long time until we get a representative sample of the population. In addition, the raw numbers of trials and successes can’t tell us whether the sample is representative.\nBayesian solutions Many blog posts have been written on how to use Bayesian statistics to answer the above questions, so I won’t get into too much detail here (see the posts by David Robinson, Maciej Kula, Chris Stucchio, and Evan Miller if you need more background). The general idea is that we assume that the success rates for the control and test variants are drawn from Beta(αA, βA) and Beta(αB, βB), respectively, where Beta(α, β) is the beta distribution with shape parameters α and β (which yields values in the [0, 1] interval). As the experiment runs, we update the parameters of the distributions – each success gets added to the group’s α, and each unsuccessful trial gets added to the group’s β. It is often reasonable to assume that the prior (i.e., initial) values of α and β are the same for both variants. If we denote the prior values of the parameters with α and β, and the number of successes and trials for group x with Sx and Tx respectively, we get that the success rates are distributed according to Beta(α + SA, β + TA – SA) for control and Beta(α + SB, β + TB – SB) for test.\nFor example, if α = β = 1, TA = 200, SA = 120, TB = 200, and SB = 100, plotting the probability density functions yields the following chart (A – blue, B – red):\nGiven these distributions, we can calculate the most probable range for the success rate of each variant, and estimate the difference in success rate between the variants. These can be calculated by deriving closed formulas, or by drawing samples from each distribution. In addition, it is important to note that the distributions change as we gather more data, even if the raw success rates don’t. For example, multiplying each count by 10 to obtain TA = 2000, SA = 1200, TB = 2000, and SB = 1000 doesn’t change the success rates, but it does change the distributions – they become much narrower:\nIn the second case we’ve gathered ten times the data, which made the distributions much more distinct. Intuitively, this means we can now be more confident that the success rate of A is higher than that of B. Quantifying this confidence and deciding when to conclude the experiment isn’t straightforward, and should depend on factors that aren’t fully captured by the raw counts. The way I chose to address this issue is presented below, after briefly discussing existing calculators and their limitations.\nExisting online calculators The beauty of frequentist tools for significance testing is that they always give you a simple answer. For example, if we plug the numbers from the first case above (TA = 200, SA = 120, TB = 200, and SB = 100) into Evan Miller’s calculator, we get:\nUnfortunately, both Bayesian calculators that I’m aware of have some limitations. Plugging the same numbers into the calculators by PeakConversion and Lyst would inform you that the probability of A being best is approximately 0.98, but it won’t tell you what’s the best way forward given this information. PeakConversion also outputs the 95% success rate intervals for A (between 53.1% and 66.7%) and B (between 43.1% and 56.9%), but it doesn’t let users set the prior values α and β (it uses α = β = 0.5). The ability to set priors based on what we know about our experimental setting is an important feature of Bayesian statistics that can help reduce the number of false positives. Hiding the priors in PeakConversion’s calculator makes it easier to use but less powerful than Lyst’s tool. In addition, Lyst’s calculator presents the distribution of differences between the success rates of A and B, i.e., the effect size. This is important because we may not bother implementing certain changes if the effect is negligible, even if the probability of one variant being better than the other is very close to 1.\nDespite being more powerful, I find Lyst’s calculator just a bit too technical. Specifically, setting the α and β priors requires some familiarity with the beta distribution, which many people don’t have. Also, the effect size distribution is important, but can be hard to get one’s head around. Therefore, I decided to extend Lyst’s calculator, aiming to release a new tool that is both powerful and easy to use.\nBuilding the new calculator The source code for Lyst’s calculator is available on GitHub, so I decided to use that as the foundation of the new calculator. The first step was to convert the code from HTML, CSS, and JavaScript to Jade, Sass, and CoffeeScript, and clean up some code duplication. As the calculator is served from my GitHub Pages domain, it was easiest to put all the code in that repository. Once I had an environment and codebase that I was happy with, it was time to make functional changes:\nChange the layout to be responsive, so it’d work well on mobile devices. Enable sharing of results by changing the URL when the input changes. Provide clear instructions, so that the calculator can be used by people who don’t necessarily have a strong background in statistics. Allow users to set priors based on more familiar figures than the beta distribution’s α and β priors. Make a clear and well-justified recommendation on how to proceed. While the first two changes were straightforward to implement, the other points were somewhat more challenging. Specifically, providing clear explanations that assume little background knowledge isn’t simple, and I still feel that the current version of the new calculator is a bit too wordy (this may be improved in the future based on user feedback – suggestions welcome). Life would be easier if everyone thought of observed values as being drawn from distributions, but in my experience this is not always the case. However, I believe it is important to communicate the reality of uncertainty, so I don’t want to hide it from users of the calculator, even at the price of more elaborate explanations.\nMaking the priors more intuitive was a bit tricky. At first, I thought I’d let users state their prior knowledge in terms of the mean and variance of past performance, relying on the fact that for Beta(α, β) the mean μ is α / (α + β), and the variance σ2 is αβ / (α + β)2(α + β + 1). The problem is that while the mean is simple to set, as it is always in the (0, 1) range, the upper bound for the variance depends on the mean. Specifically, it can be shown that the variance is in the range (0, μ(1 – μ)). Therefore, I decided to let users quantify their uncertainty about the mean as a number u in the range (0, 1), where σ2 = uμ(1 – μ). Having played with the calculator a bit, I think this makes it easier to set good informative priors. It is also worth noting that I considered allowing users to set different priors for the control and test group, but decided against it to reduce complexity. In addition, it makes sense to have the same prior for both groups – if you have a strong belief or knowledge on which one is going to perform better, you probably don’t need to run an experiment.\nOne of the main reasons I decided to build the calculator was because I wanted a tool that outputs a clear recommendation. This proved to be the most challenging (and interesting) part of this project, as there are quite a few options for Bayesian stopping rules. After reading David Robinson’s review of the limitations of a stopping rule based on the expected loss, and a few of the other resources mentioned in his post, I decided to go with a combination of the third and fourth rules tested by John Kruschke. These rules rely on a threshold of caring, which is the minimum effect size that is seen as significant by the user. For example, if we’re running experiments on the conversion rate of a landing page, we may decide that we don’t care if the absolute change in conversion rate is less than 0.1%. Given this threshold and data from the experiment, the following recommendations are possible:\nStop the experiment and implement either variant, because the difference between the variants is smaller than the threshold. Stop the experiment and implement the winning variant, because the difference between the variants is greater than the threshold. Keep running the experiment, because there isn’t enough data to make a decision. Formally, Kruschke’s rules work as follows. Given the minimum effect threshold t, we define a region of practical equivalence (ROPE) to zero difference as the interval [-t, t]. Then, we compare the ROPE to the 95% high density interval (HDI) of the distribution of differences between A and B. When comparing the ROPE and HDI, there are three options that correspond to the recommendations above:\nThe ROPE is completely contained in the HDI (stop the experiment and implement either variant). The intersection between the ROPE and HDI is empty (stop the experiment and implement the winning variant). The ROPE and HDI only partly overlap (keep running the experiment). Kruschke’s post shows that making the rule more restrictive by adding a notion of user-settable precision can reduce the rate of false positives. The idea is to stop only if the HDI is narrower than precision multiplied by the width of the ROPE. Intuitively, this forces the experimenter to collect more data because it makes the posterior distributions narrower (as shown by the charts above). I found it hard to explain the idea of precision, and didn’t want to confuse users by adding another parameter, so I decided to use a constant precision value of 0.8. If the ROPE and HDI don’t overlap, the tool makes a recommendation to stop, accompanied by a binary level of confidence: high if the precision condition is met, and low otherwise.\nPutting in the numbers from the running example (TA = 200, SA = 120, TB = 200, and SB = 100) together with a minimum effect of 1%, prior success rate of 50%, and 57.74% uncertainty (equivalent to α = β = 1), we get the following output:\nThe full results also include plots of the distributions and their high density intervals. I’m pretty happy with the richer information provided by the calculator, though it still has some limitations and areas that can be improved.\nLimitations and potential improvements As mentioned above, I’d love to reduce the wordiness of the calculator while keeping it self-contained, but I need some feedback to understand if any explanations are redundant. It’d also be great to reduce the reliance on magic numbers, such as the 95% HDI and 0.8 precision used for generating a recommendation. However, making these settable by users would increase the complexity of using the calculator, which is already harder to use than the frequentist alternative. Nonetheless, it’s important to remember that oversimplification is the reason why it’s easier to make the wrong decision when following the classical approach.\nOther potential changes include switching to a closed-form formula rather than draws from a distribution, comparing more than two variants, and improving Kruschke’s stopping rules by simulating more scenarios than those considered in his post. In addition, I’d like to go beyond binary responses (success/failure) to support continuous rewards (e.g., revenue), and allow users to specify different costs for the variants (e.g., implementing B may cost more than sticking with A).\nFinally, it is important to keep in mind that significance testing can’t tell you whether your sample is representative of the population. For example, if you run an experiment on a very popular website, you can get a sample of thousands of people within a few minutes. Concluding an experiment based on such a sample is probably a bad idea, as it is plausible that you would reach different conclusions if you kept running the experiment for a few days, to reduce the effect that the time of day has on the results. Similarly, a few days may not be enough if your user population behaves differently on weekends – you would need to run the experiment over a few weeks. This can be extended to months and years to rule out seasonal effects, but it is up to the experimenter to weigh the practicality of considering such factors versus the need to make decisions (see articles by Peep Laja, Martin Goodson, Sam Ju, and Kohavi et al. for more details). The main thing to remember is that you just cannot completely eliminate uncertainty and the need to consider background knowledge, which is why I believe that helping more people follow the Bayesian approach is a step in the right direction.\n","wordCount":"2637","inLanguage":"en","image":"https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator.png","datePublished":"2016-06-19T10:32:15Z","dateModified":"2024-02-21T11:52:55+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Making Bayesian A/B testing more accessible</h1><div class=post-meta><span title='2016-06-19 10:32:15 +0000 UTC'>June 19, 2016</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator_hu2128e6cfab878bae9a83560d8015bf85_45345_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator_hu2128e6cfab878bae9a83560d8015bf85_45345_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator_hu2128e6cfab878bae9a83560d8015bf85_45345_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator_hu2128e6cfab878bae9a83560d8015bf85_45345_1080x0_resize_box_3.png 1080w ,https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator.png 1280w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator.png alt width=1280 height=600></figure><div class=post-content><p>Much has been written in recent years on the pitfalls of using traditional hypothesis testing with online A/B tests. A key issue is that <a href=http://www.evanmiller.org/how-not-to-run-an-ab-test.html target=_blank rel=noopener>you&rsquo;re likely to end up with many false positives if you repeatedly check your results and stop as soon as you reach statistical significance</a>. One way of dealing with this issue is by <a href=http://www.evanmiller.org/bayesian-ab-testing.html target=_blank rel=noopener>following a Bayesian approach</a> to deciding when the experiment should be stopped. While I find the Bayesian view of statistics much more intuitive than the frequentist view, it can be quite challenging to explain Bayesian concepts to laypeople. Hence, I decided to build a new <a href=https://yanirs.github.io/tools/split-test-calculator/ target=_blank rel=noopener>Bayesian A/B testing calculator</a>, which aims to make these concepts clear to any user. This post discusses the general problem and existing solutions, followed by a review of the new tool and how it can be improved further.</p><h2 id=the-problem>The problem<a hidden class=anchor aria-hidden=true href=#the-problem>#</a></h2><p>The classic A/B testing problem is as follows. Suppose we run an experiment where we have a control group and a test group. Participants (typically website visitors) are allocated to groups randomly, and each group is presented with a different variant of the website or page (e.g., variant A is assigned to the control group and variant B is assigned to the test group). Our aim is to increase the overall number of binary <em>successes</em>, where success can be defined as clicking a button or opening a new account. Hence, we track the number of <em>trials</em> in each group together with the number of successes. For a given group, the number of successes divided by number of trials is the group&rsquo;s raw success rate.</p><p>Given the results of an experiment (trials and successes for each group), there are a few questions we would typically like to answer:</p><ol><li>Should we choose variant A or variant B to maximise our success rate?</li><li>How much would our success rate change if we chose one variant over the other?</li><li>Do we have enough data or should we keep experimenting?</li></ol><p>It&rsquo;s important to note some points that might be obvious, but are often overlooked. First, we run an experiment because we assume that it will help us uncover a <a href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/>causal link</a>, where something about A or B is hypothesised to cause people to behave differently, thereby affecting the overall success rate. Second, we <em>want</em> to make a decision and choose either A or B, rather than maintain multiple variants and present the best variant depending on a participant&rsquo;s features (a problem that&rsquo;s addressed by <a href=https://en.wikipedia.org/wiki/Multi-armed_bandit#Contextual_Bandit target=_blank rel=noopener>contextual bandits</a>, for example). Third, online A/B testing is different from traditional experiments in a lab, because we often have little control over the characteristics of our participants, and when, where, and how they choose to interact with our experiment. This is an important point, because it means that we may need to wait a long time until we get a representative sample of the population. In addition, the raw numbers of trials and successes can&rsquo;t tell us whether the sample is representative.</p><h2 id=bayesian-solutions>Bayesian solutions<a hidden class=anchor aria-hidden=true href=#bayesian-solutions>#</a></h2><p>Many blog posts have been written on how to use Bayesian statistics to answer the above questions, so I won&rsquo;t get into too much detail here (see the posts by <a href=http://varianceexplained.org/r/bayesian_ab_baseball/ target=_blank rel=noopener>David Robinson</a>, <a href=http://developers.lyst.com/2014/05/10/bayesian-ab-testing/ target=_blank rel=noopener>Maciej Kula</a>, <a href=https://www.chrisstucchio.com/blog/2014/bayesian_ab_decision_rule.html target=_blank rel=noopener>Chris Stucchio</a>, and <a href=http://www.evanmiller.org/bayesian-ab-testing.html target=_blank rel=noopener>Evan Miller</a> if you need more background). The general idea is that we assume that the success rates for the control and test variants are drawn from Beta(α<sub>A</sub>, β<sub>A</sub>) and Beta(α<sub>B</sub>, β<sub>B</sub>), respectively, where Beta(α, β) is the <a href=https://en.wikipedia.org/wiki/Beta_distribution target=_blank rel=noopener>beta distribution</a> with shape parameters α and β (which yields values in the [0, 1] interval). As the experiment runs, we update the parameters of the distributions – each success gets added to the group&rsquo;s α, and each unsuccessful trial gets added to the group&rsquo;s β. It is often reasonable to assume that the prior (i.e., initial) values of α and β are the same for both variants. If we denote the prior values of the parameters with α<sub></sub> and β<sub></sub>, and the number of successes and trials for group x with S<sub>x</sub> and T<sub>x</sub> respectively, we get that the success rates are distributed according to Beta(α<sub></sub> + S<sub>A</sub>, β<sub></sub> + T<sub>A</sub> – S<sub>A</sub>) for control and Beta(α<sub></sub> + S<sub>B</sub>, β<sub></sub> + T<sub>B</sub> – S<sub>B</sub>) for test.</p><p>For example, if α<sub></sub> = β<sub></sub> = 1, T<sub>A</sub> = 200, S<sub>A</sub> = 120, T<sub>B</sub> = 200, and S<sub>B</sub> = 100, plotting the probability density functions yields the following chart (A – blue, B – red):</p><figure><a href=beta-distributions-examples.png target=_blank rel=noopener><img sizes="(min-width: 768px) 614px,
+<meta name=keywords content="analytics,causal inference,data science,split testing,statistics"><meta name=description content="A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Making Bayesian A/B testing more accessible"><meta property="og:description" content="A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/"><meta property="og:image" content="https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2016-06-19T10:32:15+00:00"><meta property="article:modified_time" content="2024-02-21T11:52:55+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator.png"><meta name=twitter:title content="Making Bayesian A/B testing more accessible"><meta name=twitter:description content="A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Making Bayesian A/B testing more accessible","item":"https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Making Bayesian A/B testing more accessible","name":"Making Bayesian A\/B testing more accessible","description":"A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.","keywords":["analytics","causal inference","data science","split testing","statistics"],"articleBody":"Much has been written in recent years on the pitfalls of using traditional hypothesis testing with online A/B tests. A key issue is that you’re likely to end up with many false positives if you repeatedly check your results and stop as soon as you reach statistical significance. One way of dealing with this issue is by following a Bayesian approach to deciding when the experiment should be stopped. While I find the Bayesian view of statistics much more intuitive than the frequentist view, it can be quite challenging to explain Bayesian concepts to laypeople. Hence, I decided to build a new Bayesian A/B testing calculator, which aims to make these concepts clear to any user. This post discusses the general problem and existing solutions, followed by a review of the new tool and how it can be improved further.\nThe problem The classic A/B testing problem is as follows. Suppose we run an experiment where we have a control group and a test group. Participants (typically website visitors) are allocated to groups randomly, and each group is presented with a different variant of the website or page (e.g., variant A is assigned to the control group and variant B is assigned to the test group). Our aim is to increase the overall number of binary successes, where success can be defined as clicking a button or opening a new account. Hence, we track the number of trials in each group together with the number of successes. For a given group, the number of successes divided by number of trials is the group’s raw success rate.\nGiven the results of an experiment (trials and successes for each group), there are a few questions we would typically like to answer:\nShould we choose variant A or variant B to maximise our success rate? How much would our success rate change if we chose one variant over the other? Do we have enough data or should we keep experimenting? It’s important to note some points that might be obvious, but are often overlooked. First, we run an experiment because we assume that it will help us uncover a causal link, where something about A or B is hypothesised to cause people to behave differently, thereby affecting the overall success rate. Second, we want to make a decision and choose either A or B, rather than maintain multiple variants and present the best variant depending on a participant’s features (a problem that’s addressed by contextual bandits, for example). Third, online A/B testing is different from traditional experiments in a lab, because we often have little control over the characteristics of our participants, and when, where, and how they choose to interact with our experiment. This is an important point, because it means that we may need to wait a long time until we get a representative sample of the population. In addition, the raw numbers of trials and successes can’t tell us whether the sample is representative.\nBayesian solutions Many blog posts have been written on how to use Bayesian statistics to answer the above questions, so I won’t get into too much detail here (see the posts by David Robinson, Maciej Kula, Chris Stucchio, and Evan Miller if you need more background). The general idea is that we assume that the success rates for the control and test variants are drawn from Beta(αA, βA) and Beta(αB, βB), respectively, where Beta(α, β) is the beta distribution with shape parameters α and β (which yields values in the [0, 1] interval). As the experiment runs, we update the parameters of the distributions – each success gets added to the group’s α, and each unsuccessful trial gets added to the group’s β. It is often reasonable to assume that the prior (i.e., initial) values of α and β are the same for both variants. If we denote the prior values of the parameters with α and β, and the number of successes and trials for group x with Sx and Tx respectively, we get that the success rates are distributed according to Beta(α + SA, β + TA – SA) for control and Beta(α + SB, β + TB – SB) for test.\nFor example, if α = β = 1, TA = 200, SA = 120, TB = 200, and SB = 100, plotting the probability density functions yields the following chart (A – blue, B – red):\nGiven these distributions, we can calculate the most probable range for the success rate of each variant, and estimate the difference in success rate between the variants. These can be calculated by deriving closed formulas, or by drawing samples from each distribution. In addition, it is important to note that the distributions change as we gather more data, even if the raw success rates don’t. For example, multiplying each count by 10 to obtain TA = 2000, SA = 1200, TB = 2000, and SB = 1000 doesn’t change the success rates, but it does change the distributions – they become much narrower:\nIn the second case we’ve gathered ten times the data, which made the distributions much more distinct. Intuitively, this means we can now be more confident that the success rate of A is higher than that of B. Quantifying this confidence and deciding when to conclude the experiment isn’t straightforward, and should depend on factors that aren’t fully captured by the raw counts. The way I chose to address this issue is presented below, after briefly discussing existing calculators and their limitations.\nExisting online calculators The beauty of frequentist tools for significance testing is that they always give you a simple answer. For example, if we plug the numbers from the first case above (TA = 200, SA = 120, TB = 200, and SB = 100) into Evan Miller’s calculator, we get:\nUnfortunately, both Bayesian calculators that I’m aware of have some limitations. Plugging the same numbers into the calculators by PeakConversion and Lyst would inform you that the probability of A being best is approximately 0.98, but it won’t tell you what’s the best way forward given this information. PeakConversion also outputs the 95% success rate intervals for A (between 53.1% and 66.7%) and B (between 43.1% and 56.9%), but it doesn’t let users set the prior values α and β (it uses α = β = 0.5). The ability to set priors based on what we know about our experimental setting is an important feature of Bayesian statistics that can help reduce the number of false positives. Hiding the priors in PeakConversion’s calculator makes it easier to use but less powerful than Lyst’s tool. In addition, Lyst’s calculator presents the distribution of differences between the success rates of A and B, i.e., the effect size. This is important because we may not bother implementing certain changes if the effect is negligible, even if the probability of one variant being better than the other is very close to 1.\nDespite being more powerful, I find Lyst’s calculator just a bit too technical. Specifically, setting the α and β priors requires some familiarity with the beta distribution, which many people don’t have. Also, the effect size distribution is important, but can be hard to get one’s head around. Therefore, I decided to extend Lyst’s calculator, aiming to release a new tool that is both powerful and easy to use.\nBuilding the new calculator The source code for Lyst’s calculator is available on GitHub, so I decided to use that as the foundation of the new calculator. The first step was to convert the code from HTML, CSS, and JavaScript to Jade, Sass, and CoffeeScript, and clean up some code duplication. As the calculator is served from my GitHub Pages domain, it was easiest to put all the code in that repository. Once I had an environment and codebase that I was happy with, it was time to make functional changes:\nChange the layout to be responsive, so it’d work well on mobile devices. Enable sharing of results by changing the URL when the input changes. Provide clear instructions, so that the calculator can be used by people who don’t necessarily have a strong background in statistics. Allow users to set priors based on more familiar figures than the beta distribution’s α and β priors. Make a clear and well-justified recommendation on how to proceed. While the first two changes were straightforward to implement, the other points were somewhat more challenging. Specifically, providing clear explanations that assume little background knowledge isn’t simple, and I still feel that the current version of the new calculator is a bit too wordy (this may be improved in the future based on user feedback – suggestions welcome). Life would be easier if everyone thought of observed values as being drawn from distributions, but in my experience this is not always the case. However, I believe it is important to communicate the reality of uncertainty, so I don’t want to hide it from users of the calculator, even at the price of more elaborate explanations.\nMaking the priors more intuitive was a bit tricky. At first, I thought I’d let users state their prior knowledge in terms of the mean and variance of past performance, relying on the fact that for Beta(α, β) the mean μ is α / (α + β), and the variance σ2 is αβ / (α + β)2(α + β + 1). The problem is that while the mean is simple to set, as it is always in the (0, 1) range, the upper bound for the variance depends on the mean. Specifically, it can be shown that the variance is in the range (0, μ(1 – μ)). Therefore, I decided to let users quantify their uncertainty about the mean as a number u in the range (0, 1), where σ2 = uμ(1 – μ). Having played with the calculator a bit, I think this makes it easier to set good informative priors. It is also worth noting that I considered allowing users to set different priors for the control and test group, but decided against it to reduce complexity. In addition, it makes sense to have the same prior for both groups – if you have a strong belief or knowledge on which one is going to perform better, you probably don’t need to run an experiment.\nOne of the main reasons I decided to build the calculator was because I wanted a tool that outputs a clear recommendation. This proved to be the most challenging (and interesting) part of this project, as there are quite a few options for Bayesian stopping rules. After reading David Robinson’s review of the limitations of a stopping rule based on the expected loss, and a few of the other resources mentioned in his post, I decided to go with a combination of the third and fourth rules tested by John Kruschke. These rules rely on a threshold of caring, which is the minimum effect size that is seen as significant by the user. For example, if we’re running experiments on the conversion rate of a landing page, we may decide that we don’t care if the absolute change in conversion rate is less than 0.1%. Given this threshold and data from the experiment, the following recommendations are possible:\nStop the experiment and implement either variant, because the difference between the variants is smaller than the threshold. Stop the experiment and implement the winning variant, because the difference between the variants is greater than the threshold. Keep running the experiment, because there isn’t enough data to make a decision. Formally, Kruschke’s rules work as follows. Given the minimum effect threshold t, we define a region of practical equivalence (ROPE) to zero difference as the interval [-t, t]. Then, we compare the ROPE to the 95% high density interval (HDI) of the distribution of differences between A and B. When comparing the ROPE and HDI, there are three options that correspond to the recommendations above:\nThe ROPE is completely contained in the HDI (stop the experiment and implement either variant). The intersection between the ROPE and HDI is empty (stop the experiment and implement the winning variant). The ROPE and HDI only partly overlap (keep running the experiment). Kruschke’s post shows that making the rule more restrictive by adding a notion of user-settable precision can reduce the rate of false positives. The idea is to stop only if the HDI is narrower than precision multiplied by the width of the ROPE. Intuitively, this forces the experimenter to collect more data because it makes the posterior distributions narrower (as shown by the charts above). I found it hard to explain the idea of precision, and didn’t want to confuse users by adding another parameter, so I decided to use a constant precision value of 0.8. If the ROPE and HDI don’t overlap, the tool makes a recommendation to stop, accompanied by a binary level of confidence: high if the precision condition is met, and low otherwise.\nPutting in the numbers from the running example (TA = 200, SA = 120, TB = 200, and SB = 100) together with a minimum effect of 1%, prior success rate of 50%, and 57.74% uncertainty (equivalent to α = β = 1), we get the following output:\nThe full results also include plots of the distributions and their high density intervals. I’m pretty happy with the richer information provided by the calculator, though it still has some limitations and areas that can be improved.\nLimitations and potential improvements As mentioned above, I’d love to reduce the wordiness of the calculator while keeping it self-contained, but I need some feedback to understand if any explanations are redundant. It’d also be great to reduce the reliance on magic numbers, such as the 95% HDI and 0.8 precision used for generating a recommendation. However, making these settable by users would increase the complexity of using the calculator, which is already harder to use than the frequentist alternative. Nonetheless, it’s important to remember that oversimplification is the reason why it’s easier to make the wrong decision when following the classical approach.\nOther potential changes include switching to a closed-form formula rather than draws from a distribution, comparing more than two variants, and improving Kruschke’s stopping rules by simulating more scenarios than those considered in his post. In addition, I’d like to go beyond binary responses (success/failure) to support continuous rewards (e.g., revenue), and allow users to specify different costs for the variants (e.g., implementing B may cost more than sticking with A).\nFinally, it is important to keep in mind that significance testing can’t tell you whether your sample is representative of the population. For example, if you run an experiment on a very popular website, you can get a sample of thousands of people within a few minutes. Concluding an experiment based on such a sample is probably a bad idea, as it is plausible that you would reach different conclusions if you kept running the experiment for a few days, to reduce the effect that the time of day has on the results. Similarly, a few days may not be enough if your user population behaves differently on weekends – you would need to run the experiment over a few weeks. This can be extended to months and years to rule out seasonal effects, but it is up to the experimenter to weigh the practicality of considering such factors versus the need to make decisions (see articles by Peep Laja, Martin Goodson, Sam Ju, and Kohavi et al. for more details). The main thing to remember is that you just cannot completely eliminate uncertainty and the need to consider background knowledge, which is why I believe that helping more people follow the Bayesian approach is a step in the right direction.\n","wordCount":"2637","inLanguage":"en","image":"https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator.png","datePublished":"2016-06-19T10:32:15Z","dateModified":"2024-02-21T11:52:55+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Making Bayesian A/B testing more accessible</h1><div class=post-meta><span title='2016-06-19 10:32:15 +0000 UTC'>June 19, 2016</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator_hu2128e6cfab878bae9a83560d8015bf85_45345_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator_hu2128e6cfab878bae9a83560d8015bf85_45345_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator_hu2128e6cfab878bae9a83560d8015bf85_45345_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator_hu2128e6cfab878bae9a83560d8015bf85_45345_1080x0_resize_box_3.png 1080w ,https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator.png 1280w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/bayesian-split-testing-calculator.png alt width=1280 height=600></figure><div class=post-content><p>Much has been written in recent years on the pitfalls of using traditional hypothesis testing with online A/B tests. A key issue is that <a href=http://www.evanmiller.org/how-not-to-run-an-ab-test.html target=_blank rel=noopener>you&rsquo;re likely to end up with many false positives if you repeatedly check your results and stop as soon as you reach statistical significance</a>. One way of dealing with this issue is by <a href=http://www.evanmiller.org/bayesian-ab-testing.html target=_blank rel=noopener>following a Bayesian approach</a> to deciding when the experiment should be stopped. While I find the Bayesian view of statistics much more intuitive than the frequentist view, it can be quite challenging to explain Bayesian concepts to laypeople. Hence, I decided to build a new <a href=https://yanirs.github.io/tools/split-test-calculator/ target=_blank rel=noopener>Bayesian A/B testing calculator</a>, which aims to make these concepts clear to any user. This post discusses the general problem and existing solutions, followed by a review of the new tool and how it can be improved further.</p><h2 id=the-problem>The problem<a hidden class=anchor aria-hidden=true href=#the-problem>#</a></h2><p>The classic A/B testing problem is as follows. Suppose we run an experiment where we have a control group and a test group. Participants (typically website visitors) are allocated to groups randomly, and each group is presented with a different variant of the website or page (e.g., variant A is assigned to the control group and variant B is assigned to the test group). Our aim is to increase the overall number of binary <em>successes</em>, where success can be defined as clicking a button or opening a new account. Hence, we track the number of <em>trials</em> in each group together with the number of successes. For a given group, the number of successes divided by number of trials is the group&rsquo;s raw success rate.</p><p>Given the results of an experiment (trials and successes for each group), there are a few questions we would typically like to answer:</p><ol><li>Should we choose variant A or variant B to maximise our success rate?</li><li>How much would our success rate change if we chose one variant over the other?</li><li>Do we have enough data or should we keep experimenting?</li></ol><p>It&rsquo;s important to note some points that might be obvious, but are often overlooked. First, we run an experiment because we assume that it will help us uncover a <a href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/>causal link</a>, where something about A or B is hypothesised to cause people to behave differently, thereby affecting the overall success rate. Second, we <em>want</em> to make a decision and choose either A or B, rather than maintain multiple variants and present the best variant depending on a participant&rsquo;s features (a problem that&rsquo;s addressed by <a href=https://en.wikipedia.org/wiki/Multi-armed_bandit#Contextual_Bandit target=_blank rel=noopener>contextual bandits</a>, for example). Third, online A/B testing is different from traditional experiments in a lab, because we often have little control over the characteristics of our participants, and when, where, and how they choose to interact with our experiment. This is an important point, because it means that we may need to wait a long time until we get a representative sample of the population. In addition, the raw numbers of trials and successes can&rsquo;t tell us whether the sample is representative.</p><h2 id=bayesian-solutions>Bayesian solutions<a hidden class=anchor aria-hidden=true href=#bayesian-solutions>#</a></h2><p>Many blog posts have been written on how to use Bayesian statistics to answer the above questions, so I won&rsquo;t get into too much detail here (see the posts by <a href=http://varianceexplained.org/r/bayesian_ab_baseball/ target=_blank rel=noopener>David Robinson</a>, <a href=http://developers.lyst.com/2014/05/10/bayesian-ab-testing/ target=_blank rel=noopener>Maciej Kula</a>, <a href=https://www.chrisstucchio.com/blog/2014/bayesian_ab_decision_rule.html target=_blank rel=noopener>Chris Stucchio</a>, and <a href=http://www.evanmiller.org/bayesian-ab-testing.html target=_blank rel=noopener>Evan Miller</a> if you need more background). The general idea is that we assume that the success rates for the control and test variants are drawn from Beta(α<sub>A</sub>, β<sub>A</sub>) and Beta(α<sub>B</sub>, β<sub>B</sub>), respectively, where Beta(α, β) is the <a href=https://en.wikipedia.org/wiki/Beta_distribution target=_blank rel=noopener>beta distribution</a> with shape parameters α and β (which yields values in the [0, 1] interval). As the experiment runs, we update the parameters of the distributions – each success gets added to the group&rsquo;s α, and each unsuccessful trial gets added to the group&rsquo;s β. It is often reasonable to assume that the prior (i.e., initial) values of α and β are the same for both variants. If we denote the prior values of the parameters with α<sub></sub> and β<sub></sub>, and the number of successes and trials for group x with S<sub>x</sub> and T<sub>x</sub> respectively, we get that the success rates are distributed according to Beta(α<sub></sub> + S<sub>A</sub>, β<sub></sub> + T<sub>A</sub> – S<sub>A</sub>) for control and Beta(α<sub></sub> + S<sub>B</sub>, β<sub></sub> + T<sub>B</sub> – S<sub>B</sub>) for test.</p><p>For example, if α<sub></sub> = β<sub></sub> = 1, T<sub>A</sub> = 200, S<sub>A</sub> = 120, T<sub>B</sub> = 200, and S<sub>B</sub> = 100, plotting the probability density functions yields the following chart (A – blue, B – red):</p><figure><a href=beta-distributions-examples.png target=_blank rel=noopener><img sizes="(min-width: 768px) 614px,
 100vw" srcset="https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/beta-distributions-examples_hu6083fd67121821db147a70ce91579621_12977_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/beta-distributions-examples_hu6083fd67121821db147a70ce91579621_12977_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/beta-distributions-examples.png 614w," src=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/beta-distributions-examples.png alt="Beta distributions examples" loading=lazy></a></figure><p>Given these distributions, we can calculate the most probable range for the success rate of each variant, and estimate the difference in success rate between the variants. These can be calculated by <a href=http://www.evanmiller.org/bayesian-ab-testing.html target=_blank rel=noopener>deriving closed formulas</a>, or by <a href=http://varianceexplained.org/r/bayesian_ab_baseball/ target=_blank rel=noopener>drawing samples from each distribution</a>. In addition, it is important to note that the distributions change as we gather more data, even if the raw success rates don&rsquo;t. For example, multiplying each count by 10 to obtain T<sub>A</sub> = 2000, S<sub>A</sub> = 1200, T<sub>B</sub> = 2000, and S<sub>B</sub> = 1000 doesn&rsquo;t change the success rates, but it does change the distributions – they become much narrower:</p><figure><a href=beta-distributions-examples-narrower.png target=_blank rel=noopener><img sizes="(min-width: 768px) 613px,
diff --git a/2016/08/04/is-data-scientist-a-useless-job-title/index.html b/2016/08/04/is-data-scientist-a-useless-job-title/index.html
index 18a0a715b..f74fc5382 100644
--- a/2016/08/04/is-data-scientist-a-useless-job-title/index.html
+++ b/2016/08/04/is-data-scientist-a-useless-job-title/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Is Data Scientist a useless job title? | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,data science,marketing,software engineering,statistics"><meta name=description content="It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Is Data Scientist a useless job title?"><meta property="og:description" content="It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/"><meta property="og:image" content="https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2016-08-04T22:26:03+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist.jpg"><meta name=twitter:title content="Is Data Scientist a useless job title?"><meta name=twitter:description content="It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Is Data Scientist a useless job title?","item":"https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Is Data Scientist a useless job title?","name":"Is Data Scientist a useless job title?","description":"It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.","keywords":["business","data science","marketing","software engineering","statistics"],"articleBody":"Data science can be defined as either the intersection or union of software engineering and statistics. In recent years, the field seems to be gravitating towards the broader unifying definition, where everyone who touches data in some way can call themselves a data scientist. Hence, while many people whose job title is Data Scientist do very useful work, the title itself has become fairly useless as an indication of what the title holder actually does. This post briefly discusses how we got to this point, where I think the field is likely to go, and what data scientists can do to remain relevant.\nThe many definitions of data science About two years ago, I published a post discussing the definition of data scientist by Josh Wills, as a person who is better at statistics than any software engineer and better at software engineering than any statistician. I still quite like this definition, because it describes me well, as someone with education and experience in both areas. However, to be better at statistics than any software engineer and better at software engineering than any statistician, you have to be truly proficient in both areas, as some software engineers are comfortable running complex experiments, and some statisticians are capable of building solid software. Quite a few people who don’t meet Wills’s criteria have decided they wanted to be data scientists too, expanding the definition to be something along the lines of someone who is better at statistics than some software engineers (who’ve never done anything fancier than calculating a sample mean) and better at software engineering than some statisticians (who can’t code).\nIn addition to software engineering and statistics, data scientists are expected to deeply understand the domain in which they operate, and be excellent communicators. This leads to the proliferation of increasingly ridiculous Venn diagrams, such as the one by Stephan Kolassa:\nThe perfect data scientist from Kolassa’s Venn diagram is a mythical sexy unicorn ninja rockstar who can transform a business just by thinking about its problems. A more realistic (and less exciting) view of data scientists is offered by Rob Hyndman:\nI take the broad inclusive view. I am a data scientist because I do data analysis, and I do research on the methodology of data analysis. The way I would express it is that I’m a data scientist with a statistical perspective and training. Other data scientists will have different perspectives and different training.\nWe are comfortable with having medical specialists, and we will go to a GP, endocrinologist, physiotherapist, etc., when we have medical problems. We also need to take a team perspective on data science.\nNone of us can realistically cover the whole field, and so we specialise on certain problems and techniques. It is crazy to think that a doctor must know everything, and it is just as crazy to think a data scientist should be an expert in statistics, mathematics, computing, programming, the application discipline, etc. Instead, we need teams of data scientists with different skills, with each being aware of the boundary of their expertise, and who to call in for help when required.\nIndeed, data science is too broad for any data scientist to fully master all areas of expertise. Despite the misleading name of the field, it encompasses both science and engineering, which is why data scientists can be categorised into two types, as suggested by Michael Hochster:\nType A (analyst): focused on static data analysis. Essentially a statistician with coding skills. Type B (builder): focused on building data products. Essentially a software engineer with knowledge in machine learning and statistics. Type A is more of a scientist, and Type B is more of an engineer. Many people end up doing both, but it is pretty rare to have an even 50-50 split between the science and engineering sides, as they require different mindsets. This is illustrated by the following diagram, showing the information flow in science and engineering (source).\nWhy Data Scientist is a useless job title Given that a data scientist is someone who does data analysis, and/or a scientist, and/or an engineer, what does it mean for a person to hold a Data Scientist position? It can mean anything, as it depends on the company and industry. A job title like Data Scientist at Company is about as meaningful as Engineer at Organisation, Scientist at Institution, or Doctor at Hospital. It gives you a general idea what the person’s background is, but provides little clue as to what the person actually does on a day-to-day basis.\nDon’t believe me? Let’s look at a few examples. Noah Lorang (Basecamp) is OK with mostly doing arithmetic. David Robinson (Stack Overflow) builds machine learning features and internal R packages, and visualises data. Robert Chang (Twitter) helps surface product insights, create data pipelines, run A/B tests, and build predictive models. Rob Hyndman (Monash University) and Jake VanderPlas (University of Washington) are academic data scientists who contribute to major R and Python open-source libraries, respectively. From personal knowledge, data scientists in many Australian enterprises focus on generating reports and building dashboards. And in my current role at Car Next Door I do a little bit of everything, e.g., implement new features, fix bugs, set up data pipelines and dashboards, run experiments, build predictive models, and analyse data.\nTo be clear, the work done by many data scientists is very useful. The number of decisions made based on arbitrary thresholds and some means multiplied together on a spreadsheet can be horrifying to those of us with minimal knowledge of basic statistics. Having a good data scientist on board can have a transformative effect on a business. But it’s also very easy to end up with ineffective hires working on low-impact tasks if the business has no idea what their data scientists should be doing. This situation isn’t uncommon, given the wide range of activities that may be performed by data scientists, the lack of consensus on the definition of the field, and a general disagreement over who deserves to be called a real data scientist. We need to move beyond the hype towards clearer definitions that would help align the expectations of data scientists with those of their current and future employers.\nIt’s time to specialise Four years ago, I changed my LinkedIn title from software engineer with a research background to data scientist. Various offers started coming my way, and they haven’t stopped since. Many people have done the same. To be a data scientist, you just need to call yourself a data scientist. The dilution of the term means that as a job title, it is useless. Useless terms are unlikely to last, so if you’re seriously thinking of becoming a data scientist, you should also consider specialising. I believe we’ll see the emergence of new specific titles, such as Machine Learning Engineer. In addition, less “sexy” titles, such as Data Analyst, may end up making a comeback. In any case, those of us who invest in building their skills, delivering value in their job, and making sure people know about it don’t have much to worry about.\nWhat do you think? Is specialisation inevitable or are generalist data scientists here to stay? Please let me know privately, via Twitter, or in the comments section.\n","wordCount":"1213","inLanguage":"en","image":"https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist.jpg","datePublished":"2016-08-04T22:26:03Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Is Data Scientist a useless job title?</h1><div class=post-meta><span title='2016-08-04 22:26:03 +0000 UTC'>August 4, 2016</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist_huea073ab29ab3b372a3741ce291b85548_111212_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist_huea073ab29ab3b372a3741ce291b85548_111212_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist_huea073ab29ab3b372a3741ce291b85548_111212_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist_huea073ab29ab3b372a3741ce291b85548_111212_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist_huea073ab29ab3b372a3741ce291b85548_111212_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist.jpg 1600w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist.jpg alt width=1600 height=960></figure><div class=post-content><p>Data science can be defined as either the <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>intersection</a> or <a href=http://robjhyndman.com/hyndsight/am-i-a-data-scientist/ target=_blank rel=noopener>union</a> of software engineering and statistics. In recent years, the field seems to be gravitating towards the broader unifying definition, where everyone who touches data in some way can call themselves a data scientist. Hence, while many people whose job title is Data Scientist do very useful work, the title itself has become fairly useless as an indication of what the title holder actually does. This post briefly discusses how we got to this point, where I think the field is likely to go, and what data scientists can do to remain relevant.</p><h2 id=the-many-definitions-of-data-science>The many definitions of data science<a hidden class=anchor aria-hidden=true href=#the-many-definitions-of-data-science>#</a></h2><p>About two years ago, I <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>published a post discussing the definition of data scientist by Josh Wills</a>, as a <em>person who is better at statistics than any software engineer and better at software engineering than any statistician</em>. I still quite like this definition, because it describes me well, as someone with education and experience in both areas. However, to be better at statistics than <strong>any</strong> software engineer and better at software engineering than <strong>any</strong> statistician, you have to be truly proficient in both areas, as some software engineers are <a href=https://code.facebook.com/posts/1072626246134461/introducing-fblearner-flow-facebook-s-ai-backbone/ target=_blank rel=noopener>comfortable running complex experiments</a>, and some statisticians <a href=https://www.r-project.org/contributors.html target=_blank rel=noopener>are capable of building solid software</a>. Quite a few people who don&rsquo;t meet Wills&rsquo;s criteria have decided they wanted to be data scientists too, expanding the definition to be something along the lines of <em>someone who is better at statistics than some software engineers (who&rsquo;ve never done anything fancier than calculating a sample mean) and better at software engineering than some statisticians (who can&rsquo;t code)</em>.</p><p>In addition to software engineering and statistics, data scientists are expected to deeply understand the domain in which they operate, and be excellent communicators. This leads to the proliferation of increasingly ridiculous Venn diagrams, such as the one by <a href=http://datascience.stackexchange.com/a/2406 target=_blank rel=noopener>Stephan Kolassa</a>:</p><figure><a href=perfect-data-scientist-venn-diagram.png target=_blank rel=noopener><img sizes="(min-width: 768px) 572px,
+<meta name=keywords content="business,data science,marketing,software engineering,statistics"><meta name=description content="It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Is Data Scientist a useless job title?"><meta property="og:description" content="It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/"><meta property="og:image" content="https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2016-08-04T22:26:03+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist.jpg"><meta name=twitter:title content="Is Data Scientist a useless job title?"><meta name=twitter:description content="It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Is Data Scientist a useless job title?","item":"https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Is Data Scientist a useless job title?","name":"Is Data Scientist a useless job title?","description":"It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.","keywords":["business","data science","marketing","software engineering","statistics"],"articleBody":"Data science can be defined as either the intersection or union of software engineering and statistics. In recent years, the field seems to be gravitating towards the broader unifying definition, where everyone who touches data in some way can call themselves a data scientist. Hence, while many people whose job title is Data Scientist do very useful work, the title itself has become fairly useless as an indication of what the title holder actually does. This post briefly discusses how we got to this point, where I think the field is likely to go, and what data scientists can do to remain relevant.\nThe many definitions of data science About two years ago, I published a post discussing the definition of data scientist by Josh Wills, as a person who is better at statistics than any software engineer and better at software engineering than any statistician. I still quite like this definition, because it describes me well, as someone with education and experience in both areas. However, to be better at statistics than any software engineer and better at software engineering than any statistician, you have to be truly proficient in both areas, as some software engineers are comfortable running complex experiments, and some statisticians are capable of building solid software. Quite a few people who don’t meet Wills’s criteria have decided they wanted to be data scientists too, expanding the definition to be something along the lines of someone who is better at statistics than some software engineers (who’ve never done anything fancier than calculating a sample mean) and better at software engineering than some statisticians (who can’t code).\nIn addition to software engineering and statistics, data scientists are expected to deeply understand the domain in which they operate, and be excellent communicators. This leads to the proliferation of increasingly ridiculous Venn diagrams, such as the one by Stephan Kolassa:\nThe perfect data scientist from Kolassa’s Venn diagram is a mythical sexy unicorn ninja rockstar who can transform a business just by thinking about its problems. A more realistic (and less exciting) view of data scientists is offered by Rob Hyndman:\nI take the broad inclusive view. I am a data scientist because I do data analysis, and I do research on the methodology of data analysis. The way I would express it is that I’m a data scientist with a statistical perspective and training. Other data scientists will have different perspectives and different training.\nWe are comfortable with having medical specialists, and we will go to a GP, endocrinologist, physiotherapist, etc., when we have medical problems. We also need to take a team perspective on data science.\nNone of us can realistically cover the whole field, and so we specialise on certain problems and techniques. It is crazy to think that a doctor must know everything, and it is just as crazy to think a data scientist should be an expert in statistics, mathematics, computing, programming, the application discipline, etc. Instead, we need teams of data scientists with different skills, with each being aware of the boundary of their expertise, and who to call in for help when required.\nIndeed, data science is too broad for any data scientist to fully master all areas of expertise. Despite the misleading name of the field, it encompasses both science and engineering, which is why data scientists can be categorised into two types, as suggested by Michael Hochster:\nType A (analyst): focused on static data analysis. Essentially a statistician with coding skills. Type B (builder): focused on building data products. Essentially a software engineer with knowledge in machine learning and statistics. Type A is more of a scientist, and Type B is more of an engineer. Many people end up doing both, but it is pretty rare to have an even 50-50 split between the science and engineering sides, as they require different mindsets. This is illustrated by the following diagram, showing the information flow in science and engineering (source).\nWhy Data Scientist is a useless job title Given that a data scientist is someone who does data analysis, and/or a scientist, and/or an engineer, what does it mean for a person to hold a Data Scientist position? It can mean anything, as it depends on the company and industry. A job title like Data Scientist at Company is about as meaningful as Engineer at Organisation, Scientist at Institution, or Doctor at Hospital. It gives you a general idea what the person’s background is, but provides little clue as to what the person actually does on a day-to-day basis.\nDon’t believe me? Let’s look at a few examples. Noah Lorang (Basecamp) is OK with mostly doing arithmetic. David Robinson (Stack Overflow) builds machine learning features and internal R packages, and visualises data. Robert Chang (Twitter) helps surface product insights, create data pipelines, run A/B tests, and build predictive models. Rob Hyndman (Monash University) and Jake VanderPlas (University of Washington) are academic data scientists who contribute to major R and Python open-source libraries, respectively. From personal knowledge, data scientists in many Australian enterprises focus on generating reports and building dashboards. And in my current role at Car Next Door I do a little bit of everything, e.g., implement new features, fix bugs, set up data pipelines and dashboards, run experiments, build predictive models, and analyse data.\nTo be clear, the work done by many data scientists is very useful. The number of decisions made based on arbitrary thresholds and some means multiplied together on a spreadsheet can be horrifying to those of us with minimal knowledge of basic statistics. Having a good data scientist on board can have a transformative effect on a business. But it’s also very easy to end up with ineffective hires working on low-impact tasks if the business has no idea what their data scientists should be doing. This situation isn’t uncommon, given the wide range of activities that may be performed by data scientists, the lack of consensus on the definition of the field, and a general disagreement over who deserves to be called a real data scientist. We need to move beyond the hype towards clearer definitions that would help align the expectations of data scientists with those of their current and future employers.\nIt’s time to specialise Four years ago, I changed my LinkedIn title from software engineer with a research background to data scientist. Various offers started coming my way, and they haven’t stopped since. Many people have done the same. To be a data scientist, you just need to call yourself a data scientist. The dilution of the term means that as a job title, it is useless. Useless terms are unlikely to last, so if you’re seriously thinking of becoming a data scientist, you should also consider specialising. I believe we’ll see the emergence of new specific titles, such as Machine Learning Engineer. In addition, less “sexy” titles, such as Data Analyst, may end up making a comeback. In any case, those of us who invest in building their skills, delivering value in their job, and making sure people know about it don’t have much to worry about.\nWhat do you think? Is specialisation inevitable or are generalist data scientists here to stay? Please let me know privately, via Twitter, or in the comments section.\n","wordCount":"1213","inLanguage":"en","image":"https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist.jpg","datePublished":"2016-08-04T22:26:03Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Is Data Scientist a useless job title?</h1><div class=post-meta><span title='2016-08-04 22:26:03 +0000 UTC'>August 4, 2016</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist_huea073ab29ab3b372a3741ce291b85548_111212_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist_huea073ab29ab3b372a3741ce291b85548_111212_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist_huea073ab29ab3b372a3741ce291b85548_111212_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist_huea073ab29ab3b372a3741ce291b85548_111212_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist_huea073ab29ab3b372a3741ce291b85548_111212_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist.jpg 1600w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/silly-data-scientist.jpg alt width=1600 height=960></figure><div class=post-content><p>Data science can be defined as either the <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>intersection</a> or <a href=http://robjhyndman.com/hyndsight/am-i-a-data-scientist/ target=_blank rel=noopener>union</a> of software engineering and statistics. In recent years, the field seems to be gravitating towards the broader unifying definition, where everyone who touches data in some way can call themselves a data scientist. Hence, while many people whose job title is Data Scientist do very useful work, the title itself has become fairly useless as an indication of what the title holder actually does. This post briefly discusses how we got to this point, where I think the field is likely to go, and what data scientists can do to remain relevant.</p><h2 id=the-many-definitions-of-data-science>The many definitions of data science<a hidden class=anchor aria-hidden=true href=#the-many-definitions-of-data-science>#</a></h2><p>About two years ago, I <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>published a post discussing the definition of data scientist by Josh Wills</a>, as a <em>person who is better at statistics than any software engineer and better at software engineering than any statistician</em>. I still quite like this definition, because it describes me well, as someone with education and experience in both areas. However, to be better at statistics than <strong>any</strong> software engineer and better at software engineering than <strong>any</strong> statistician, you have to be truly proficient in both areas, as some software engineers are <a href=https://code.facebook.com/posts/1072626246134461/introducing-fblearner-flow-facebook-s-ai-backbone/ target=_blank rel=noopener>comfortable running complex experiments</a>, and some statisticians <a href=https://www.r-project.org/contributors.html target=_blank rel=noopener>are capable of building solid software</a>. Quite a few people who don&rsquo;t meet Wills&rsquo;s criteria have decided they wanted to be data scientists too, expanding the definition to be something along the lines of <em>someone who is better at statistics than some software engineers (who&rsquo;ve never done anything fancier than calculating a sample mean) and better at software engineering than some statisticians (who can&rsquo;t code)</em>.</p><p>In addition to software engineering and statistics, data scientists are expected to deeply understand the domain in which they operate, and be excellent communicators. This leads to the proliferation of increasingly ridiculous Venn diagrams, such as the one by <a href=http://datascience.stackexchange.com/a/2406 target=_blank rel=noopener>Stephan Kolassa</a>:</p><figure><a href=perfect-data-scientist-venn-diagram.png target=_blank rel=noopener><img sizes="(min-width: 768px) 572px,
 100vw" srcset="https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/perfect-data-scientist-venn-diagram_hud98ced44a7d59059ed9f0909603cdd3f_139916_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/perfect-data-scientist-venn-diagram_hud98ced44a7d59059ed9f0909603cdd3f_139916_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/perfect-data-scientist-venn-diagram.png 572w," src=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/perfect-data-scientist-venn-diagram.png alt="Perfect data scientist Venn diagram" loading=lazy></a></figure><p>The perfect data scientist from Kolassa&rsquo;s Venn diagram is a mythical sexy unicorn ninja rockstar who can transform a business just by thinking about its problems. A more realistic (and less exciting) view of data scientists is <a href=http://robjhyndman.com/hyndsight/am-i-a-data-scientist/ target=_blank rel=noopener>offered by Rob Hyndman</a>:</p><blockquote><p>I take the broad inclusive view. I am a data scientist because I do data analysis, and I do research on the methodology of data analysis. The way I would express it is that I’m a data scientist with a statistical perspective and training. Other data scientists will have different perspectives and different training.</p><p>We are comfortable with having medical specialists, and we will go to a GP, endocrinologist, physiotherapist, etc., when we have medical problems. We also need to take a team perspective on data science.</p><p>None of us can realistically cover the whole field, and so we specialise on certain problems and techniques. It is crazy to think that a doctor must know everything, and it is just as crazy to think a data scientist should be an expert in statistics, mathematics, computing, programming, the application discipline, etc. Instead, we need teams of data scientists with different skills, with each being aware of the boundary of their expertise, and who to call in for help when required.</p></blockquote><p>Indeed, data science is too broad for any data scientist to fully master all areas of expertise. Despite the misleading name of the field, it encompasses both science and engineering, which is why data scientists can be categorised into two types, as <a href="https://www.quora.com/What-is-data-science/answer/Michael-Hochster?srid=2sK8&share=98226ca3" target=_blank rel=noopener>suggested by Michael Hochster</a>:</p><ul><li>Type A (analyst): focused on static data analysis. Essentially a statistician with coding skills.</li><li>Type B (builder): focused on building data products. Essentially a software engineer with knowledge in machine learning and statistics.</li></ul><p>Type A is more of a scientist, and Type B is more of an engineer. Many people end up doing both, but it is pretty rare to have an even 50-50 split between the science and engineering sides, as they require different mindsets. This is illustrated by the following diagram, showing the information flow in science and engineering (<a href=https://www.farnamstreetblog.com/2013/07/the-difference-between-science-and-engineering/ target=_blank rel=noopener>source</a>).</p><figure><a href=science-versus-engineering.png target=_blank rel=noopener><img sizes="(min-width: 768px) 581px,
diff --git a/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/index.html b/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/index.html
index 2cd65223c..7b46293a8 100644
--- a/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/index.html
+++ b/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>If you don’t pay attention, data can drive you off a cliff | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="analytics,business,data science,marketing,statistics"><meta name=description content="Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="If you don’t pay attention, data can drive you off a cliff"><meta property="og:description" content="Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/"><meta property="og:image" content="https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2016-08-21T21:34:17+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff.jpg"><meta name=twitter:title content="If you don’t pay attention, data can drive you off a cliff"><meta name=twitter:description content="Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"If you don’t pay attention, data can drive you off a cliff","item":"https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"If you don’t pay attention, data can drive you off a cliff","name":"If you don’t pay attention, data can drive you off a cliff","description":"Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.","keywords":["analytics","business","data science","marketing","statistics"],"articleBody":"You’re a hotshot manager. You love your dashboards and you keep your finger on the beating pulse of the business. You take pride in using data to drive your decisions rather than shooting from the hip like one of those old-school 1950s bosses. This is the 21st century, and data is king. You even hired a sexy statistician or data scientist, though you don’t really understand what they do. Never mind, you can proudly tell all your friends that you are leading a modern data-driven team. Nothing can go wrong, right? Incorrect. If you don’t pay attention, data can drive you off a cliff. This article discusses seven of the ways this can happen. Read on to ensure it doesn’t happen to you.\n1. Pretending uncertainty doesn’t exist Source: Standard error, Wikipedia Last month, your favourite metric was 5.2%. This month, it’s 5.5%. Looks like things are getting better – you must be doing something right! But is 5.5% really different from 5.2%? All things being equal, you should expect some variability in most of your metrics. The values you see are drawn from a distribution of possible values, which means you can’t be certain what value you’ll be seeing next. Fortunately, with more data you would be able to quantify this uncertainty and know which values are more likely. Don’t fear or ignore uncertainty. Embrace and study it, and you’ll be on the right track.\n2. Confusing observed and unobserved quantities Source: Estimates of Uncertainty around the RBA’s Forecasts Everyone agrees that the future is uncertain. We can generate forecasts with varying degrees of confidence, but we never know for sure what’s going to happen. However, some people tend to ignore uncertainty in forecasts, treating the unobserved future values as comparable to observed present values. For example, marketers often compare customer lifetime value with the cost of acquiring a customer. The problem is that customer lifetime value relies on a prediction of the net profit from a customer (so it’s largely unobserved and uncertain), while the business has much more control and certainty around the cost of acquiring a customer (though it’s not completely known). Treating the two values as if they’re observed and known is risky, as it can lead to major financial losses.\n3. Thinking that your data is correct Ask anyone who works with data, and they’ll tell you that it’s always messy. A well-known saying among data scientists is that 80% of the work is data cleaning and the other 20% is complaining about data cleaning. Hence, it’s likely that at least some of the figures you’re relying on to make decisions are somewhat inaccurate. However, it’s important to remember that this doesn’t make the data completely useless. But if something looks too good to be true, it probably isn’t true. Finally, it’s highly unlikely that the data is always correct when you like the results and always incorrect when the results aren’t favourable, so don’t use the “guy on the internet said our data isn’t 100% correct” excuse to push back on inconvenient truths.\n4. Believing that your data is complete No matter how big you are, your data doesn’t capture everything your customers do. Even Google and the NSA don’t have a full view of what people are up to in the non-digital world, and they can’t completely read our minds (yet). Most businesses have much less data than the big tech companies, and they look a bit silly trying to explain customer behaviour using only the data they have. At the end of the day, you have to work with the data you can access, but never underestimate the effectiveness of obtaining more (relevant) data.\n5. Measuring the wrong thing Source: Measuring what matters: How to pick a good metric Maybe you recently read an article emphasising the importance of real metrics, like daily active users, as opposed to vanity metrics like number of signups to your service. You therefore decide to track the daily active users of your product. But have you thought about whether this metric is relevant to what you’re trying to achieve? If you run a business like Airbnb, where transactions are inherently infrequent, do you really care if people don’t regularly log in? You probably don’t, as long as they use the product when they actually need it. Measuring and trying to optimise the wrong thing can be very risky. Indeed, deciding on metrics and their measurement can be seen as the hardest parts of data science.\n6. Not recognising your unconscious incompetence Source: Four stages of competence, Wikipedia To quote Bertrand Russell: “One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision.” Not recognising the extent of your ignorance when it comes to data is pretty common among those with no training in the field, which may lead to illusory superiority. This may be exacerbated by the fact that those who do know what they’re doing tend to talk a lot about uncertainty and how there are many things that are simply unknowable. My hope is that this short article would help people graduate from unconscious incompetence, where you don’t even recognise the importance of what you don’t know, to conscious incompetence, where you recognise the need to learn and rely on expert advice.\n7. Ignoring expert advice Once you’ve recognised your skill gaps, you may decide to hire a data scientist to help you get more value out of your data. However, despite the hype, data scientists are not magicians. In fact, because of the hype, the definition of data science is so diluted that some people say that the term itself has become useless. The truth is that dealing with data is hard, every organisation is somewhat different, and it takes time and commitment to get value out of data. The worst thing you can do is to hire an expensive expert to help you, and then ignore their advice when their findings are hard to digest. If you’re not ready to work with a data scientist, you might as well save yourself some money and remain in a state of blissful ignorance.\nNote: This article is not a portrayal of how things are with my current employer, Car Next Door. Views expressed are my own. In fact, if you want to work at a place where expert advice is acted on and uncertainty is seen as something to be studied rather than ignored, we’re hiring!\n","wordCount":"1091","inLanguage":"en","image":"https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff.jpg","datePublished":"2016-08-21T21:34:17Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">If you don’t pay attention, data can drive you off a cliff</h1><div class=post-meta><span title='2016-08-21 21:34:17 +0000 UTC'>August 21, 2016</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff_hu6382e6839197ee91ca183c977de9f45a_133028_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff_hu6382e6839197ee91ca183c977de9f45a_133028_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff_hu6382e6839197ee91ca183c977de9f45a_133028_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff_hu6382e6839197ee91ca183c977de9f45a_133028_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff_hu6382e6839197ee91ca183c977de9f45a_133028_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff.jpg 1920w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff.jpg alt width=1920 height=657></figure><div class=post-content><p>You&rsquo;re a hotshot manager. You love your dashboards and you keep your finger on the beating pulse of the business. You take pride in using data to drive your decisions rather than shooting from the hip like one of those old-school 1950s bosses. This is the 21st century, and data is king. You even hired a <a href=https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century target=_blank rel=noopener>sexy statistician or data scientist</a>, though you don&rsquo;t really understand what they do. Never mind, you can proudly tell all your friends that you are leading a modern data-driven team. Nothing can go wrong, right? Incorrect. If you don&rsquo;t pay attention, data can drive you off a cliff. This article discusses seven of the ways this can happen. Read on to ensure it doesn&rsquo;t happen to you.</p><h2 id=1-pretending-uncertainty-doesnt-exist>1. Pretending uncertainty doesn&rsquo;t exist<a hidden class=anchor aria-hidden=true href=#1-pretending-uncertainty-doesnt-exist>#</a></h2><figure><a href=standard-deviation-diagram.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="analytics,business,data science,marketing,statistics"><meta name=description content="Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="If you don’t pay attention, data can drive you off a cliff"><meta property="og:description" content="Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/"><meta property="og:image" content="https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2016-08-21T21:34:17+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff.jpg"><meta name=twitter:title content="If you don’t pay attention, data can drive you off a cliff"><meta name=twitter:description content="Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"If you don’t pay attention, data can drive you off a cliff","item":"https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"If you don’t pay attention, data can drive you off a cliff","name":"If you don’t pay attention, data can drive you off a cliff","description":"Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.","keywords":["analytics","business","data science","marketing","statistics"],"articleBody":"You’re a hotshot manager. You love your dashboards and you keep your finger on the beating pulse of the business. You take pride in using data to drive your decisions rather than shooting from the hip like one of those old-school 1950s bosses. This is the 21st century, and data is king. You even hired a sexy statistician or data scientist, though you don’t really understand what they do. Never mind, you can proudly tell all your friends that you are leading a modern data-driven team. Nothing can go wrong, right? Incorrect. If you don’t pay attention, data can drive you off a cliff. This article discusses seven of the ways this can happen. Read on to ensure it doesn’t happen to you.\n1. Pretending uncertainty doesn’t exist Source: Standard error, Wikipedia Last month, your favourite metric was 5.2%. This month, it’s 5.5%. Looks like things are getting better – you must be doing something right! But is 5.5% really different from 5.2%? All things being equal, you should expect some variability in most of your metrics. The values you see are drawn from a distribution of possible values, which means you can’t be certain what value you’ll be seeing next. Fortunately, with more data you would be able to quantify this uncertainty and know which values are more likely. Don’t fear or ignore uncertainty. Embrace and study it, and you’ll be on the right track.\n2. Confusing observed and unobserved quantities Source: Estimates of Uncertainty around the RBA’s Forecasts Everyone agrees that the future is uncertain. We can generate forecasts with varying degrees of confidence, but we never know for sure what’s going to happen. However, some people tend to ignore uncertainty in forecasts, treating the unobserved future values as comparable to observed present values. For example, marketers often compare customer lifetime value with the cost of acquiring a customer. The problem is that customer lifetime value relies on a prediction of the net profit from a customer (so it’s largely unobserved and uncertain), while the business has much more control and certainty around the cost of acquiring a customer (though it’s not completely known). Treating the two values as if they’re observed and known is risky, as it can lead to major financial losses.\n3. Thinking that your data is correct Ask anyone who works with data, and they’ll tell you that it’s always messy. A well-known saying among data scientists is that 80% of the work is data cleaning and the other 20% is complaining about data cleaning. Hence, it’s likely that at least some of the figures you’re relying on to make decisions are somewhat inaccurate. However, it’s important to remember that this doesn’t make the data completely useless. But if something looks too good to be true, it probably isn’t true. Finally, it’s highly unlikely that the data is always correct when you like the results and always incorrect when the results aren’t favourable, so don’t use the “guy on the internet said our data isn’t 100% correct” excuse to push back on inconvenient truths.\n4. Believing that your data is complete No matter how big you are, your data doesn’t capture everything your customers do. Even Google and the NSA don’t have a full view of what people are up to in the non-digital world, and they can’t completely read our minds (yet). Most businesses have much less data than the big tech companies, and they look a bit silly trying to explain customer behaviour using only the data they have. At the end of the day, you have to work with the data you can access, but never underestimate the effectiveness of obtaining more (relevant) data.\n5. Measuring the wrong thing Source: Measuring what matters: How to pick a good metric Maybe you recently read an article emphasising the importance of real metrics, like daily active users, as opposed to vanity metrics like number of signups to your service. You therefore decide to track the daily active users of your product. But have you thought about whether this metric is relevant to what you’re trying to achieve? If you run a business like Airbnb, where transactions are inherently infrequent, do you really care if people don’t regularly log in? You probably don’t, as long as they use the product when they actually need it. Measuring and trying to optimise the wrong thing can be very risky. Indeed, deciding on metrics and their measurement can be seen as the hardest parts of data science.\n6. Not recognising your unconscious incompetence Source: Four stages of competence, Wikipedia To quote Bertrand Russell: “One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision.” Not recognising the extent of your ignorance when it comes to data is pretty common among those with no training in the field, which may lead to illusory superiority. This may be exacerbated by the fact that those who do know what they’re doing tend to talk a lot about uncertainty and how there are many things that are simply unknowable. My hope is that this short article would help people graduate from unconscious incompetence, where you don’t even recognise the importance of what you don’t know, to conscious incompetence, where you recognise the need to learn and rely on expert advice.\n7. Ignoring expert advice Once you’ve recognised your skill gaps, you may decide to hire a data scientist to help you get more value out of your data. However, despite the hype, data scientists are not magicians. In fact, because of the hype, the definition of data science is so diluted that some people say that the term itself has become useless. The truth is that dealing with data is hard, every organisation is somewhat different, and it takes time and commitment to get value out of data. The worst thing you can do is to hire an expensive expert to help you, and then ignore their advice when their findings are hard to digest. If you’re not ready to work with a data scientist, you might as well save yourself some money and remain in a state of blissful ignorance.\nNote: This article is not a portrayal of how things are with my current employer, Car Next Door. Views expressed are my own. In fact, if you want to work at a place where expert advice is acted on and uncertainty is seen as something to be studied rather than ignored, we’re hiring!\n","wordCount":"1091","inLanguage":"en","image":"https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff.jpg","datePublished":"2016-08-21T21:34:17Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">If you don’t pay attention, data can drive you off a cliff</h1><div class=post-meta><span title='2016-08-21 21:34:17 +0000 UTC'>August 21, 2016</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff_hu6382e6839197ee91ca183c977de9f45a_133028_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff_hu6382e6839197ee91ca183c977de9f45a_133028_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff_hu6382e6839197ee91ca183c977de9f45a_133028_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff_hu6382e6839197ee91ca183c977de9f45a_133028_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff_hu6382e6839197ee91ca183c977de9f45a_133028_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff.jpg 1920w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/data-driven-off-cliff.jpg alt width=1920 height=657></figure><div class=post-content><p>You&rsquo;re a hotshot manager. You love your dashboards and you keep your finger on the beating pulse of the business. You take pride in using data to drive your decisions rather than shooting from the hip like one of those old-school 1950s bosses. This is the 21st century, and data is king. You even hired a <a href=https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century target=_blank rel=noopener>sexy statistician or data scientist</a>, though you don&rsquo;t really understand what they do. Never mind, you can proudly tell all your friends that you are leading a modern data-driven team. Nothing can go wrong, right? Incorrect. If you don&rsquo;t pay attention, data can drive you off a cliff. This article discusses seven of the ways this can happen. Read on to ensure it doesn&rsquo;t happen to you.</p><h2 id=1-pretending-uncertainty-doesnt-exist>1. Pretending uncertainty doesn&rsquo;t exist<a hidden class=anchor aria-hidden=true href=#1-pretending-uncertainty-doesnt-exist>#</a></h2><figure><a href=standard-deviation-diagram.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/standard-deviation-diagram_hu8a11f999223fd4c4976332ce4d982b62_68903_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/standard-deviation-diagram_hu8a11f999223fd4c4976332ce4d982b62_68903_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/standard-deviation-diagram_hu8a11f999223fd4c4976332ce4d982b62_68903_720x0_resize_box_3.png 720w,
diff --git a/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/index.html b/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/index.html
index 344f8a4eb..c2518128a 100644
--- a/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/index.html
+++ b/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Ask Why! Finding motives, causes, and purpose in data science | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="causal inference,data science,insights,personal"><meta name=description content="Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Ask Why! Finding motives, causes, and purpose in data science"><meta property="og:description" content="Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/"><meta property="og:image" content="https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/why-brick-wall-large.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2016-09-19T21:28:44+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/why-brick-wall-large.jpg"><meta name=twitter:title content="Ask Why! Finding motives, causes, and purpose in data science"><meta name=twitter:description content="Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Ask Why! Finding motives, causes, and purpose in data science","item":"https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Ask Why! Finding motives, causes, and purpose in data science","name":"Ask Why! Finding motives, causes, and purpose in data science","description":"Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what \u0026amp; how of predictive modelling.","keywords":["causal inference","data science","insights","personal"],"articleBody":"Some people equate predictive modelling with data science, thinking that mastering various machine learning techniques is the key that unlocks the mysteries of the field. However, there is much more to data science than the What and How of predictive modelling. I recently gave a talk where I argued the importance of asking Why, touching on three different topics: stakeholder motives, cause-and-effect relationships, and finding a sense of purpose. A video of the talk is available below. Unfortunately, the videographer mostly focused on me pacing rather than on the screen, but you can check out the slides here (note that you need to use both the left/right and up/down arrows to see all the slides).\nIf you’re interested in the topics covered in the talk, here are a few posts you should read.\nStakeholders and their motives\nIf you don’t pay attention, data can drive you off a cliff The hardest parts of data science You don’t need a data scientist (yet) Causality and experimentation\nMaking Bayesian A/B testing more accessible Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions Why you should stop worrying about deep learning and deepen your understanding of causality instead Purpose, ethics, and my personal path\nShould data science really do that? (on KDNuggets) The long road to a lifestyle business My divestment from fossil fuels The rise of greedy robots Cover image: Why by Ksayer\n","wordCount":"232","inLanguage":"en","image":"https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/why-brick-wall-large.jpg","datePublished":"2016-09-19T21:28:44Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Ask Why! Finding motives, causes, and purpose in data science</h1><div class=post-meta><span title='2016-09-19 21:28:44 +0000 UTC'>September 19, 2016</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/why-brick-wall-large_hu3d03a01dcc18bc5be0e67db3d8d209a6_279085_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/why-brick-wall-large_hu3d03a01dcc18bc5be0e67db3d8d209a6_279085_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/why-brick-wall-large_hu3d03a01dcc18bc5be0e67db3d8d209a6_279085_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/why-brick-wall-large.jpg 1024w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/why-brick-wall-large.jpg alt width=1024 height=685></figure><div class=post-content><p>Some people equate predictive modelling with data science, thinking that mastering various machine learning techniques is the key that unlocks the mysteries of the field. However, there is much more to data science than the <em>What</em> and <em>How</em> of predictive modelling. I recently gave a talk where I argued the importance of asking <em>Why</em>, touching on three different topics: stakeholder motives, cause-and-effect relationships, and finding a sense of purpose. <a href="http://www.youtube.com/watch?v=2wqu-drqlpo" target=_blank rel=noopener>A video of the talk</a> is available below. Unfortunately, the videographer mostly focused on me pacing rather than on the screen, but you can <a href=https://yanirs.github.io/talks/ask-why/ target=_blank rel=noopener>check out the slides here</a> (note that you need to use both the left/right and up/down arrows to see all the slides).</p><p><div style=position:relative;padding-bottom:56.25%;height:0;overflow:hidden><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen loading=eager referrerpolicy=strict-origin-when-cross-origin src="https://www.youtube.com/embed/2wqu-drqlpo?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style=position:absolute;top:0;left:0;width:100%;height:100%;border:0 title="YouTube video"></iframe></div></p><p>If you&rsquo;re interested in the topics covered in the talk, here are a few posts you should read.</p><p><strong>Stakeholders and their motives</strong></p><ul><li><a href=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/>If you don&rsquo;t pay attention, data can drive you off a cliff</a></li><li><a href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/>The hardest parts of data science</a></li><li><a href=https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/>You don&rsquo;t need a data scientist (yet)</a></li></ul><p><strong>Causality and experimentation</strong></p><ul><li><a href=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/>Making Bayesian A/B testing more accessible</a></li><li><a href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/>Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions</a></li><li><a href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/>Why you should stop worrying about deep learning and deepen your understanding of causality instead</a></li></ul><p><strong>Purpose, ethics, and my personal path</strong></p><ul><li><a href=http://www.kdnuggets.com/2015/05/should-data-science-do-that.html target=_blank rel=noopener>Should data science really do that? (on KDNuggets)</a></li><li><a href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/>The long road to a lifestyle business</a></li><li><a href=https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/>My divestment from fossil fuels</a></li><li><a href=https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/>The rise of greedy robots</a></li></ul><p><small>Cover image: <a href=https://flic.kr/p/9yaos5 target=_blank rel=noopener>Why by Ksayer</a></small></p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/causal-inference/>Causal Inference</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/insights/>Insights</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Ask Why! Finding motives, causes, and purpose in data science on x" href="https://x.com/intent/tweet/?text=Ask%20Why%21%20Finding%20motives%2c%20causes%2c%20and%20purpose%20in%20data%20science&amp;url=https%3a%2f%2fyanirseroussi.com%2f2016%2f09%2f19%2fask-why-finding-motives-causes-and-purpose-in-data-science%2f&amp;hashtags=causalinference%2cdatascience%2cinsights%2cpersonal"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Ask Why! Finding motives, causes, and purpose in data science on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2016%2f09%2f19%2fask-why-finding-motives-causes-and-purpose-in-data-science%2f&amp;title=Ask%20Why%21%20Finding%20motives%2c%20causes%2c%20and%20purpose%20in%20data%20science&amp;summary=Ask%20Why%21%20Finding%20motives%2c%20causes%2c%20and%20purpose%20in%20data%20science&amp;source=https%3a%2f%2fyanirseroussi.com%2f2016%2f09%2f19%2fask-why-finding-motives-causes-and-purpose-in-data-science%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Ask Why! Finding motives, causes, and purpose in data science on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2016%2f09%2f19%2fask-why-finding-motives-causes-and-purpose-in-data-science%2f&title=Ask%20Why%21%20Finding%20motives%2c%20causes%2c%20and%20purpose%20in%20data%20science"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Ask Why! Finding motives, causes, and purpose in data science on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2016%2f09%2f19%2fask-why-finding-motives-causes-and-purpose-in-data-science%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Ask Why! Finding motives, causes, and purpose in data science on whatsapp" href="https://api.whatsapp.com/send?text=Ask%20Why%21%20Finding%20motives%2c%20causes%2c%20and%20purpose%20in%20data%20science%20-%20https%3a%2f%2fyanirseroussi.com%2f2016%2f09%2f19%2fask-why-finding-motives-causes-and-purpose-in-data-science%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Ask Why! Finding motives, causes, and purpose in data science on telegram" href="https://telegram.me/share/url?text=Ask%20Why%21%20Finding%20motives%2c%20causes%2c%20and%20purpose%20in%20data%20science&amp;url=https%3a%2f%2fyanirseroussi.com%2f2016%2f09%2f19%2fask-why-finding-motives-causes-and-purpose-in-data-science%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Ask Why! Finding motives, causes, and purpose in data science on ycombinator" href="https://news.ycombinator.com/submitlink?t=Ask%20Why%21%20Finding%20motives%2c%20causes%2c%20and%20purpose%20in%20data%20science&u=https%3a%2f%2fyanirseroussi.com%2f2016%2f09%2f19%2fask-why-finding-motives-causes-and-purpose-in-data-science%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="causal inference,data science,insights,personal"><meta name=description content="Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Ask Why! Finding motives, causes, and purpose in data science"><meta property="og:description" content="Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/"><meta property="og:image" content="https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/why-brick-wall-large.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2016-09-19T21:28:44+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/why-brick-wall-large.jpg"><meta name=twitter:title content="Ask Why! Finding motives, causes, and purpose in data science"><meta name=twitter:description content="Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Ask Why! Finding motives, causes, and purpose in data science","item":"https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Ask Why! Finding motives, causes, and purpose in data science","name":"Ask Why! Finding motives, causes, and purpose in data science","description":"Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what \u0026amp; how of predictive modelling.","keywords":["causal inference","data science","insights","personal"],"articleBody":"Some people equate predictive modelling with data science, thinking that mastering various machine learning techniques is the key that unlocks the mysteries of the field. However, there is much more to data science than the What and How of predictive modelling. I recently gave a talk where I argued the importance of asking Why, touching on three different topics: stakeholder motives, cause-and-effect relationships, and finding a sense of purpose. A video of the talk is available below. Unfortunately, the videographer mostly focused on me pacing rather than on the screen, but you can check out the slides here (note that you need to use both the left/right and up/down arrows to see all the slides).\nIf you’re interested in the topics covered in the talk, here are a few posts you should read.\nStakeholders and their motives\nIf you don’t pay attention, data can drive you off a cliff The hardest parts of data science You don’t need a data scientist (yet) Causality and experimentation\nMaking Bayesian A/B testing more accessible Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions Why you should stop worrying about deep learning and deepen your understanding of causality instead Purpose, ethics, and my personal path\nShould data science really do that? (on KDNuggets) The long road to a lifestyle business My divestment from fossil fuels The rise of greedy robots Cover image: Why by Ksayer\n","wordCount":"232","inLanguage":"en","image":"https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/why-brick-wall-large.jpg","datePublished":"2016-09-19T21:28:44Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Ask Why! Finding motives, causes, and purpose in data science</h1><div class=post-meta><span title='2016-09-19 21:28:44 +0000 UTC'>September 19, 2016</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/why-brick-wall-large_hu3d03a01dcc18bc5be0e67db3d8d209a6_279085_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/why-brick-wall-large_hu3d03a01dcc18bc5be0e67db3d8d209a6_279085_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/why-brick-wall-large_hu3d03a01dcc18bc5be0e67db3d8d209a6_279085_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/why-brick-wall-large.jpg 1024w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/why-brick-wall-large.jpg alt width=1024 height=685></figure><div class=post-content><p>Some people equate predictive modelling with data science, thinking that mastering various machine learning techniques is the key that unlocks the mysteries of the field. However, there is much more to data science than the <em>What</em> and <em>How</em> of predictive modelling. I recently gave a talk where I argued the importance of asking <em>Why</em>, touching on three different topics: stakeholder motives, cause-and-effect relationships, and finding a sense of purpose. <a href="http://www.youtube.com/watch?v=2wqu-drqlpo" target=_blank rel=noopener>A video of the talk</a> is available below. Unfortunately, the videographer mostly focused on me pacing rather than on the screen, but you can <a href=https://yanirs.github.io/talks/ask-why/ target=_blank rel=noopener>check out the slides here</a> (note that you need to use both the left/right and up/down arrows to see all the slides).</p><p><div style=position:relative;padding-bottom:56.25%;height:0;overflow:hidden><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen loading=eager referrerpolicy=strict-origin-when-cross-origin src="https://www.youtube.com/embed/2wqu-drqlpo?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style=position:absolute;top:0;left:0;width:100%;height:100%;border:0 title="YouTube video"></iframe></div></p><p>If you&rsquo;re interested in the topics covered in the talk, here are a few posts you should read.</p><p><strong>Stakeholders and their motives</strong></p><ul><li><a href=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/>If you don&rsquo;t pay attention, data can drive you off a cliff</a></li><li><a href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/>The hardest parts of data science</a></li><li><a href=https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/>You don&rsquo;t need a data scientist (yet)</a></li></ul><p><strong>Causality and experimentation</strong></p><ul><li><a href=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/>Making Bayesian A/B testing more accessible</a></li><li><a href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/>Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions</a></li><li><a href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/>Why you should stop worrying about deep learning and deepen your understanding of causality instead</a></li></ul><p><strong>Purpose, ethics, and my personal path</strong></p><ul><li><a href=http://www.kdnuggets.com/2015/05/should-data-science-do-that.html target=_blank rel=noopener>Should data science really do that? (on KDNuggets)</a></li><li><a href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/>The long road to a lifestyle business</a></li><li><a href=https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/>My divestment from fossil fuels</a></li><li><a href=https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/>The rise of greedy robots</a></li></ul><p><small>Cover image: <a href=https://flic.kr/p/9yaos5 target=_blank rel=noopener>Why by Ksayer</a></small></p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/causal-inference/>Causal Inference</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/insights/>Insights</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Ask Why! Finding motives, causes, and purpose in data science on x" href="https://x.com/intent/tweet/?text=Ask%20Why%21%20Finding%20motives%2c%20causes%2c%20and%20purpose%20in%20data%20science&amp;url=https%3a%2f%2fyanirseroussi.com%2f2016%2f09%2f19%2fask-why-finding-motives-causes-and-purpose-in-data-science%2f&amp;hashtags=causalinference%2cdatascience%2cinsights%2cpersonal"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Ask Why! Finding motives, causes, and purpose in data science on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2016%2f09%2f19%2fask-why-finding-motives-causes-and-purpose-in-data-science%2f&amp;title=Ask%20Why%21%20Finding%20motives%2c%20causes%2c%20and%20purpose%20in%20data%20science&amp;summary=Ask%20Why%21%20Finding%20motives%2c%20causes%2c%20and%20purpose%20in%20data%20science&amp;source=https%3a%2f%2fyanirseroussi.com%2f2016%2f09%2f19%2fask-why-finding-motives-causes-and-purpose-in-data-science%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Ask Why! Finding motives, causes, and purpose in data science on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2016%2f09%2f19%2fask-why-finding-motives-causes-and-purpose-in-data-science%2f&title=Ask%20Why%21%20Finding%20motives%2c%20causes%2c%20and%20purpose%20in%20data%20science"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Ask Why! Finding motives, causes, and purpose in data science on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2016%2f09%2f19%2fask-why-finding-motives-causes-and-purpose-in-data-science%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Ask Why! Finding motives, causes, and purpose in data science on whatsapp" href="https://api.whatsapp.com/send?text=Ask%20Why%21%20Finding%20motives%2c%20causes%2c%20and%20purpose%20in%20data%20science%20-%20https%3a%2f%2fyanirseroussi.com%2f2016%2f09%2f19%2fask-why-finding-motives-causes-and-purpose-in-data-science%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Ask Why! Finding motives, causes, and purpose in data science on telegram" href="https://telegram.me/share/url?text=Ask%20Why%21%20Finding%20motives%2c%20causes%2c%20and%20purpose%20in%20data%20science&amp;url=https%3a%2f%2fyanirseroussi.com%2f2016%2f09%2f19%2fask-why-finding-motives-causes-and-purpose-in-data-science%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Ask Why! Finding motives, causes, and purpose in data science on ycombinator" href="https://news.ycombinator.com/submitlink?t=Ask%20Why%21%20Finding%20motives%2c%20causes%2c%20and%20purpose%20in%20data%20science&u=https%3a%2f%2fyanirseroussi.com%2f2016%2f09%2f19%2fask-why-finding-motives-causes-and-purpose-in-data-science%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/index.html b/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/index.html
index d13ffdabb..a23413e37 100644
--- a/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/index.html
+++ b/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Customer lifetime value and the proliferation of misinformation on the internet | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="analytics,business,data science,marketing,politics,predictive modelling,science communication,search engine optimisation,statistics"><meta name=description content="There&rsquo;s a lot of misleading content on the estimation of customer lifetime value. Here&rsquo;s what I learned about doing it well."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Customer lifetime value and the proliferation of misinformation on the internet"><meta property="og:description" content="There&rsquo;s a lot of misleading content on the estimation of customer lifetime value. Here&rsquo;s what I learned about doing it well."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/"><meta property="og:image" content="https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2017-01-08T20:02:30+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti.jpg"><meta name=twitter:title content="Customer lifetime value and the proliferation of misinformation on the internet"><meta name=twitter:description content="There&rsquo;s a lot of misleading content on the estimation of customer lifetime value. Here&rsquo;s what I learned about doing it well."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Customer lifetime value and the proliferation of misinformation on the internet","item":"https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Customer lifetime value and the proliferation of misinformation on the internet","name":"Customer lifetime value and the proliferation of misinformation on the internet","description":"There\u0026rsquo;s a lot of misleading content on the estimation of customer lifetime value. Here\u0026rsquo;s what I learned about doing it well.","keywords":["analytics","business","data science","marketing","politics","predictive modelling","science communication","search engine optimisation","statistics"],"articleBody":"Suppose you work for a business that has paying customers. You want to know how much money your customers are likely to spend to inform decisions on customer acquisition and retention budgets. You’ve done a bit of research, and discovered that the figure you want to calculate is commonly called the customer lifetime value. You google the term, and end up on a page with ten results (and probably some ads). How many of those results contain useful, non-misleading information? As of early 2017, fewer than half. Why is that? How can it be that after nearly 20 years of existence, Google still surfaces misleading information for common search terms? And how can you calculate your customer lifetime value correctly, avoiding the traps set up by clever search engine marketers? Read on to find out!\nBackground: Misleading search results and fake news While Google tries to filter obvious spam from its index, it still relies to a great extent on popularity to rank search results. Popularity is a function of inbound links (weighted by site credibility), and of user interaction with the presented results (e.g., time spent on a result page before moving on to the next result or search). There are two obvious problems with this approach. First, there are no guarantees that wrong, misleading, or inaccurate pages won’t be popular, and therefore earn high rankings. Second, given Google’s near-monopoly of the search market, if a page ranks highly for popular search terms, it is likely to become more popular and be seen as credible. Hence, when searching for the truth, it’d be wise to follow Abraham Lincoln’s famous warning not to trust everything you read on the internet.\nGoogle is not alone in helping spread misinformation. Following Donald Trump’s recent victory in the US presidential election, many people have blamed Facebook for allowing so-called fake news to be widely shared. Indeed, any popular media outlet or website may end up spreading misinformation, especially if – like Facebook and Google – it mainly aggregates and amplifies user-generated content. However, as noted by John Herrman, the problem is much deeper than clearly-fabricated news stories. It is hard to draw the lines between malicious spread of misinformation, slight inaccuracies, and plain ignorance. For example, how would one classify Trump’s claims that climate change is a hoax invented by the Chinese? Should Twitter block his account for knowingly spreading outright lies?\nWrong customer value calculation by example Fortunately, when it comes to customer lifetime value, I doubt that any of the top results returned by Google is intentionally misleading. This is a case where inaccuracies and misinformation result from ignorance rather than from malice. However, relying on such resources without digging further is just as risky as relying on pure fabrications. For example, see this infographic by Kissmetrics, which suggests three different formulas for calculating the average lifetime value of a Starbucks customer. Those three formulas yield very different values ($5,489, $11,535, and $25,272), which the authors then say should be averaged to yield the final lifetime value figure. All formulas are based on numbers that the authors call constants, despite the fact that numbers such as the average customer lifespan or retention rate are clearly not constant in this context (since they’re estimated from the data and used as projections into the future). Indeed, several people have commented on the flaws in Kissmetrics’ approach, which is reminiscent of the Dilbert strip where the pointy-haired boss asks Dilbert to average and multiply wrong data.\nMy main problem with the Kissmetrics infographic is that it helps feed an illusion of understanding that is prevalent among those with no statistical training. As the authors fail to acknowledge the fact that the predictions produced by the formulas are inaccurate, they may cause managers and marketers to believe that they know the lifetime value of their customers. However, it’s important to remember that all models are wrong (but some models are useful), and that the lifetime value of active customers is unknowable since it involves forecasting of uncertain quantities. Hence, it is reckless to encourage people to use the Kissmetrics formulas without trying to quantify how wrong they may be on the specific dataset they’re applied to.\nFader and Hardie: The voice of reason Notably, the work of Peter Fader and Bruce Hardie on customer lifetime value isn’t directly referenced on the first page of Google results. This is unfortunate, as they have gone through the effort of making their models accessible to people with no academic background, e.g., using Excel spreadsheets and YouTube videos. However, it is clear that they are not optimising for search engine rankings, as I found out about their work by adding search terms that the average marketer is unlikely to use (e.g., Python and Bayesian). While surveying Fader and Hardie’s large body of work is beyond the scope of this article, it is worth summarising their criticism of the lifetime value formula that is taught in introductory marketing courses.\nThe formula discussed by Fader and Hardie is CLV = sumt=0..T(m * rt / (1 + d)t), where m is the net cash flow per period, r is the retention rate, d is the discount rate, and T is the time horizon. The five issues that Fader and Hardie identify are as follows.\nThe true lifetime value is unknown while the customer is still active, so the formula is actually for the expected lifetime value, i.e., E(CLV). Since the summation is bounded, the formula isn’t really for the lifetime value – it is an estimate of value up to period T (which may still be useful). As the summation starts at t=0, it gives the expected value of a customer that hasn’t been acquired yet. According to Fader and Hardie, in some cases the formula starts at t=1, i.e., it applies only to existing customers. The distinction between the two cases isn’t always made clear. The formula assumes a constant retention rate. However, it is often the case that retention increases with tenure, i.e., customers who have been with the company for a long time are less likely to churn than recently-acquired customers. It isn’t always possible to calculate a retention rate, as the point at which a customer churns isn’t observed for many products. For example, Starbucks doesn’t know whether customers who haven’t made a purchase for a while have decided to never visit Starbucks again, or whether they’re just going through a period of inactivity. Further, given the ubiquity of Starbucks, it is probably safe to assume that all past customers have a non-zero probability of making another purchase (unless they’re physically dead). According to Fader and Hardie, “the bottom line is that there is no ‘one formula’ that can be used to compute customer lifetime value”. Therefore, teaching the above formula (or one of its variants) misleads people into thinking that they know how to calculate the lifetime value of customers. Hence, they advocate going back to the definition of lifetime value as “the present value of the future cashflows attributed to the customer relationship”, and using a probabilistic approach to generate estimates of the expected lifetime value for each customer. This conclusion also appears in a more accessible series of blog posts by Custora, where it is claimed that probabilistic modelling can yield significantly more accurate estimates than naive formulas.\nGetting serious with the lifetimes package As mentioned above, Fader and Hardie provide Excel implementations of some of their models, which produce individual-level lifetime value predictions. While this is definitely an improvement over using general formulas, better solutions are available if you can code (or have access to people who can do coding for you). For example, using a software package makes it easy to integrate the lifetime value calculation into a live product, enabling automated interventions to increase revenue and profit (among other benefits). According to Roberto Medri, this approach is followed by Etsy, where lifetime value predictions are used to retain customers and increase their value.\nAn example of a software package that I can vouch for is the Python lifetimes package, which implements several probabilistic models for lifetime value prediction in a non-contractual setting (i.e., where churn isn’t observed – as in the Starbucks example above). This package is maintained by Cameron Davidson-Pilon of Shopify, who may be known to some readers from his Bayesian Methods for Hackers book and other Python packages. I’ve successfully used the package on a real dataset and have contributed some small fixes and improvements. The documentation on GitHub is quite good, so I won’t repeat it here. However, it is worth reiterating that as with any predictive model, it is important to evaluate performance on your own dataset before deciding to rely on the package’s predictions. If you only take away one thing from this article, let it be the reminder that it is unwise to blindly accept any formula or model. The models implemented in the package (some of which were introduced by Fader and Hardie) are fairly simple and generally applicable, as they rely only on the past transaction log. These simple models are known to sometimes outperform more complex models that rely on richer data, but this isn’t guaranteed to happen on every dataset. My untested feeling is that in situations where clean and relevant training data is plentiful, models that use other features in addition to those extracted from the transaction log would outperform the models provided by the lifetimes package (if you have empirical evidence that supports or refutes this assumption, please let me know).\nConclusion: You’re better than that Accurate estimation of customer lifetime value is crucial to most businesses. It informs decisions on customer acquisition and retention, and getting it wrong can drive a business from profitability to insolvency. The rise of data science increases the availability of statistical and scientific tools to small and large businesses. Hence, there are few reasons why a revenue-generating business should rely on untested customer value formulas rather than on more realistic models. This extends beyond customer value to nearly every business endeavour: Relying on fabrications is not a sustainable growth strategy, there is no way around learning how to be intelligently driven by data, and no amount of cheap demagoguery and misinformation can alter the objective reality of our world.\n","wordCount":"1716","inLanguage":"en","image":"https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti.jpg","datePublished":"2017-01-08T20:02:30Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Customer lifetime value and the proliferation of misinformation on the internet</h1><div class=post-meta><span title='2017-01-08 20:02:30 +0000 UTC'>January 8, 2017</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti_hua38e844f1f7746a2c15b16de88399d3e_742392_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti_hua38e844f1f7746a2c15b16de88399d3e_742392_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti_hua38e844f1f7746a2c15b16de88399d3e_742392_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti_hua38e844f1f7746a2c15b16de88399d3e_742392_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti_hua38e844f1f7746a2c15b16de88399d3e_742392_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti.jpg 1920w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti.jpg alt width=1920 height=998></figure><div class=post-content><p>Suppose you work for a business that has paying customers. You want to know how much money your customers are likely to spend to inform decisions on customer acquisition and retention budgets. You&rsquo;ve done a bit of research, and discovered that the figure you want to calculate is commonly called the <em>customer lifetime value</em>. You google the term, and end up on a page with ten results (and probably some ads). How many of those results contain useful, non-misleading information? As of early 2017, fewer than half. Why is that? How can it be that after nearly 20 years of existence, Google still surfaces misleading information for common search terms? And how can you calculate your customer lifetime value correctly, avoiding the traps set up by clever search engine marketers? Read on to find out!</p><h2 id=background-misleading-search-results-and-fake-news>Background: Misleading search results and fake news<a hidden class=anchor aria-hidden=true href=#background-misleading-search-results-and-fake-news>#</a></h2><p>While Google tries to filter obvious spam from its index, it still relies to a great extent on popularity to rank search results. Popularity is a function of inbound links (weighted by site credibility), and of user interaction with the presented results (e.g., time spent on a result page before moving on to the next result or search). There are two obvious problems with this approach. First, there are no guarantees that wrong, misleading, or inaccurate pages won&rsquo;t be popular, and therefore earn high rankings. Second, given Google&rsquo;s near-monopoly of the search market, if a page ranks highly for popular search terms, it is likely to become more popular and be seen as credible. Hence, when searching for the truth, it&rsquo;d be wise to follow Abraham Lincoln&rsquo;s famous warning not to trust everything you read on the internet.</p><figure><a href=dont-believe-everything-you-read-on-the-internet-lincoln.jpg target=_blank rel=noopener><img sizes="(min-width: 768px) 576px,
+<meta name=keywords content="analytics,business,data science,marketing,politics,predictive modelling,science communication,search engine optimisation,statistics"><meta name=description content="There&rsquo;s a lot of misleading content on the estimation of customer lifetime value. Here&rsquo;s what I learned about doing it well."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Customer lifetime value and the proliferation of misinformation on the internet"><meta property="og:description" content="There&rsquo;s a lot of misleading content on the estimation of customer lifetime value. Here&rsquo;s what I learned about doing it well."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/"><meta property="og:image" content="https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2017-01-08T20:02:30+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti.jpg"><meta name=twitter:title content="Customer lifetime value and the proliferation of misinformation on the internet"><meta name=twitter:description content="There&rsquo;s a lot of misleading content on the estimation of customer lifetime value. Here&rsquo;s what I learned about doing it well."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Customer lifetime value and the proliferation of misinformation on the internet","item":"https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Customer lifetime value and the proliferation of misinformation on the internet","name":"Customer lifetime value and the proliferation of misinformation on the internet","description":"There\u0026rsquo;s a lot of misleading content on the estimation of customer lifetime value. Here\u0026rsquo;s what I learned about doing it well.","keywords":["analytics","business","data science","marketing","politics","predictive modelling","science communication","search engine optimisation","statistics"],"articleBody":"Suppose you work for a business that has paying customers. You want to know how much money your customers are likely to spend to inform decisions on customer acquisition and retention budgets. You’ve done a bit of research, and discovered that the figure you want to calculate is commonly called the customer lifetime value. You google the term, and end up on a page with ten results (and probably some ads). How many of those results contain useful, non-misleading information? As of early 2017, fewer than half. Why is that? How can it be that after nearly 20 years of existence, Google still surfaces misleading information for common search terms? And how can you calculate your customer lifetime value correctly, avoiding the traps set up by clever search engine marketers? Read on to find out!\nBackground: Misleading search results and fake news While Google tries to filter obvious spam from its index, it still relies to a great extent on popularity to rank search results. Popularity is a function of inbound links (weighted by site credibility), and of user interaction with the presented results (e.g., time spent on a result page before moving on to the next result or search). There are two obvious problems with this approach. First, there are no guarantees that wrong, misleading, or inaccurate pages won’t be popular, and therefore earn high rankings. Second, given Google’s near-monopoly of the search market, if a page ranks highly for popular search terms, it is likely to become more popular and be seen as credible. Hence, when searching for the truth, it’d be wise to follow Abraham Lincoln’s famous warning not to trust everything you read on the internet.\nGoogle is not alone in helping spread misinformation. Following Donald Trump’s recent victory in the US presidential election, many people have blamed Facebook for allowing so-called fake news to be widely shared. Indeed, any popular media outlet or website may end up spreading misinformation, especially if – like Facebook and Google – it mainly aggregates and amplifies user-generated content. However, as noted by John Herrman, the problem is much deeper than clearly-fabricated news stories. It is hard to draw the lines between malicious spread of misinformation, slight inaccuracies, and plain ignorance. For example, how would one classify Trump’s claims that climate change is a hoax invented by the Chinese? Should Twitter block his account for knowingly spreading outright lies?\nWrong customer value calculation by example Fortunately, when it comes to customer lifetime value, I doubt that any of the top results returned by Google is intentionally misleading. This is a case where inaccuracies and misinformation result from ignorance rather than from malice. However, relying on such resources without digging further is just as risky as relying on pure fabrications. For example, see this infographic by Kissmetrics, which suggests three different formulas for calculating the average lifetime value of a Starbucks customer. Those three formulas yield very different values ($5,489, $11,535, and $25,272), which the authors then say should be averaged to yield the final lifetime value figure. All formulas are based on numbers that the authors call constants, despite the fact that numbers such as the average customer lifespan or retention rate are clearly not constant in this context (since they’re estimated from the data and used as projections into the future). Indeed, several people have commented on the flaws in Kissmetrics’ approach, which is reminiscent of the Dilbert strip where the pointy-haired boss asks Dilbert to average and multiply wrong data.\nMy main problem with the Kissmetrics infographic is that it helps feed an illusion of understanding that is prevalent among those with no statistical training. As the authors fail to acknowledge the fact that the predictions produced by the formulas are inaccurate, they may cause managers and marketers to believe that they know the lifetime value of their customers. However, it’s important to remember that all models are wrong (but some models are useful), and that the lifetime value of active customers is unknowable since it involves forecasting of uncertain quantities. Hence, it is reckless to encourage people to use the Kissmetrics formulas without trying to quantify how wrong they may be on the specific dataset they’re applied to.\nFader and Hardie: The voice of reason Notably, the work of Peter Fader and Bruce Hardie on customer lifetime value isn’t directly referenced on the first page of Google results. This is unfortunate, as they have gone through the effort of making their models accessible to people with no academic background, e.g., using Excel spreadsheets and YouTube videos. However, it is clear that they are not optimising for search engine rankings, as I found out about their work by adding search terms that the average marketer is unlikely to use (e.g., Python and Bayesian). While surveying Fader and Hardie’s large body of work is beyond the scope of this article, it is worth summarising their criticism of the lifetime value formula that is taught in introductory marketing courses.\nThe formula discussed by Fader and Hardie is CLV = sumt=0..T(m * rt / (1 + d)t), where m is the net cash flow per period, r is the retention rate, d is the discount rate, and T is the time horizon. The five issues that Fader and Hardie identify are as follows.\nThe true lifetime value is unknown while the customer is still active, so the formula is actually for the expected lifetime value, i.e., E(CLV). Since the summation is bounded, the formula isn’t really for the lifetime value – it is an estimate of value up to period T (which may still be useful). As the summation starts at t=0, it gives the expected value of a customer that hasn’t been acquired yet. According to Fader and Hardie, in some cases the formula starts at t=1, i.e., it applies only to existing customers. The distinction between the two cases isn’t always made clear. The formula assumes a constant retention rate. However, it is often the case that retention increases with tenure, i.e., customers who have been with the company for a long time are less likely to churn than recently-acquired customers. It isn’t always possible to calculate a retention rate, as the point at which a customer churns isn’t observed for many products. For example, Starbucks doesn’t know whether customers who haven’t made a purchase for a while have decided to never visit Starbucks again, or whether they’re just going through a period of inactivity. Further, given the ubiquity of Starbucks, it is probably safe to assume that all past customers have a non-zero probability of making another purchase (unless they’re physically dead). According to Fader and Hardie, “the bottom line is that there is no ‘one formula’ that can be used to compute customer lifetime value”. Therefore, teaching the above formula (or one of its variants) misleads people into thinking that they know how to calculate the lifetime value of customers. Hence, they advocate going back to the definition of lifetime value as “the present value of the future cashflows attributed to the customer relationship”, and using a probabilistic approach to generate estimates of the expected lifetime value for each customer. This conclusion also appears in a more accessible series of blog posts by Custora, where it is claimed that probabilistic modelling can yield significantly more accurate estimates than naive formulas.\nGetting serious with the lifetimes package As mentioned above, Fader and Hardie provide Excel implementations of some of their models, which produce individual-level lifetime value predictions. While this is definitely an improvement over using general formulas, better solutions are available if you can code (or have access to people who can do coding for you). For example, using a software package makes it easy to integrate the lifetime value calculation into a live product, enabling automated interventions to increase revenue and profit (among other benefits). According to Roberto Medri, this approach is followed by Etsy, where lifetime value predictions are used to retain customers and increase their value.\nAn example of a software package that I can vouch for is the Python lifetimes package, which implements several probabilistic models for lifetime value prediction in a non-contractual setting (i.e., where churn isn’t observed – as in the Starbucks example above). This package is maintained by Cameron Davidson-Pilon of Shopify, who may be known to some readers from his Bayesian Methods for Hackers book and other Python packages. I’ve successfully used the package on a real dataset and have contributed some small fixes and improvements. The documentation on GitHub is quite good, so I won’t repeat it here. However, it is worth reiterating that as with any predictive model, it is important to evaluate performance on your own dataset before deciding to rely on the package’s predictions. If you only take away one thing from this article, let it be the reminder that it is unwise to blindly accept any formula or model. The models implemented in the package (some of which were introduced by Fader and Hardie) are fairly simple and generally applicable, as they rely only on the past transaction log. These simple models are known to sometimes outperform more complex models that rely on richer data, but this isn’t guaranteed to happen on every dataset. My untested feeling is that in situations where clean and relevant training data is plentiful, models that use other features in addition to those extracted from the transaction log would outperform the models provided by the lifetimes package (if you have empirical evidence that supports or refutes this assumption, please let me know).\nConclusion: You’re better than that Accurate estimation of customer lifetime value is crucial to most businesses. It informs decisions on customer acquisition and retention, and getting it wrong can drive a business from profitability to insolvency. The rise of data science increases the availability of statistical and scientific tools to small and large businesses. Hence, there are few reasons why a revenue-generating business should rely on untested customer value formulas rather than on more realistic models. This extends beyond customer value to nearly every business endeavour: Relying on fabrications is not a sustainable growth strategy, there is no way around learning how to be intelligently driven by data, and no amount of cheap demagoguery and misinformation can alter the objective reality of our world.\n","wordCount":"1716","inLanguage":"en","image":"https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti.jpg","datePublished":"2017-01-08T20:02:30Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Customer lifetime value and the proliferation of misinformation on the internet</h1><div class=post-meta><span title='2017-01-08 20:02:30 +0000 UTC'>January 8, 2017</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti_hua38e844f1f7746a2c15b16de88399d3e_742392_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti_hua38e844f1f7746a2c15b16de88399d3e_742392_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti_hua38e844f1f7746a2c15b16de88399d3e_742392_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti_hua38e844f1f7746a2c15b16de88399d3e_742392_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti_hua38e844f1f7746a2c15b16de88399d3e_742392_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti.jpg 1920w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/propaganda-graffiti.jpg alt width=1920 height=998></figure><div class=post-content><p>Suppose you work for a business that has paying customers. You want to know how much money your customers are likely to spend to inform decisions on customer acquisition and retention budgets. You&rsquo;ve done a bit of research, and discovered that the figure you want to calculate is commonly called the <em>customer lifetime value</em>. You google the term, and end up on a page with ten results (and probably some ads). How many of those results contain useful, non-misleading information? As of early 2017, fewer than half. Why is that? How can it be that after nearly 20 years of existence, Google still surfaces misleading information for common search terms? And how can you calculate your customer lifetime value correctly, avoiding the traps set up by clever search engine marketers? Read on to find out!</p><h2 id=background-misleading-search-results-and-fake-news>Background: Misleading search results and fake news<a hidden class=anchor aria-hidden=true href=#background-misleading-search-results-and-fake-news>#</a></h2><p>While Google tries to filter obvious spam from its index, it still relies to a great extent on popularity to rank search results. Popularity is a function of inbound links (weighted by site credibility), and of user interaction with the presented results (e.g., time spent on a result page before moving on to the next result or search). There are two obvious problems with this approach. First, there are no guarantees that wrong, misleading, or inaccurate pages won&rsquo;t be popular, and therefore earn high rankings. Second, given Google&rsquo;s near-monopoly of the search market, if a page ranks highly for popular search terms, it is likely to become more popular and be seen as credible. Hence, when searching for the truth, it&rsquo;d be wise to follow Abraham Lincoln&rsquo;s famous warning not to trust everything you read on the internet.</p><figure><a href=dont-believe-everything-you-read-on-the-internet-lincoln.jpg target=_blank rel=noopener><img sizes="(min-width: 768px) 576px,
 100vw" srcset="https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/_hu443dfcd29851ba5e3a0593bd3d3f46da_33137_dbefe08049f120a85e6b8ffeb571b5f7.jpg 360w,
 https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/_hu443dfcd29851ba5e3a0593bd3d3f46da_33137_f4242f0d698b489a965ede37d3bf9fda.jpg 480w,
 https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/dont-believe-everything-you-read-on-the-internet-lincoln.jpg 576w," src=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/dont-believe-everything-you-read-on-the-internet-lincoln.jpg alt="Abraham Lincoln internet quote" loading=lazy></a></figure><p>Google is not alone in helping spread misinformation. Following Donald Trump&rsquo;s recent victory in the US presidential election, <a href=https://www.facebook.com/zuck/posts/10103269806149061 target=_blank rel=noopener>many people have blamed Facebook</a> for allowing so-called fake news to be widely shared. Indeed, any popular media outlet or website may end up spreading misinformation, especially if – like Facebook and Google – it mainly aggregates and amplifies user-generated content. However, <a href=http://www.nytimes.com/2016/11/19/business/media/exposing-fake-news-eroding-trust-in-real-reporting.html target=_blank rel=noopener>as noted by John Herrman</a>, the problem is much deeper than clearly-fabricated news stories. It is hard to draw the lines between malicious spread of misinformation, slight inaccuracies, and plain ignorance. For example, how would one classify <a href=http://www.politifact.com/truth-o-meter/statements/2016/jun/03/hillary-clinton/yes-donald-trump-did-call-climate-change-chinese-h/ target=_blank rel=noopener>Trump&rsquo;s claims that climate change is a hoax invented by the Chinese</a>? Should Twitter block his account for knowingly spreading outright lies?</p><h2 id=wrong-customer-value-calculation-by-example>Wrong customer value calculation by example<a hidden class=anchor aria-hidden=true href=#wrong-customer-value-calculation-by-example>#</a></h2><p>Fortunately, when it comes to customer lifetime value, I doubt that any of the top results returned by Google is intentionally misleading. This is a case where inaccuracies and misinformation result from ignorance rather than from malice. However, relying on such resources without digging further is just as risky as relying on pure fabrications. For example, see <a href=https://blog.kissmetrics.com/how-to-calculate-lifetime-value/ target=_blank rel="nofollow noopener">this infographic by Kissmetrics</a>, which suggests three different formulas for calculating the average lifetime value of a Starbucks customer. Those three formulas yield very different values ($5,489, $11,535, and $25,272), which the authors then say should be averaged to yield the final lifetime value figure. All formulas are based on numbers that the authors call <em>constants</em>, despite the fact that numbers such as the average customer lifespan or retention rate are clearly not constant in this context (since they&rsquo;re estimated from the data and used as projections into the future). Indeed, several people have commented on the flaws in Kissmetrics&rsquo; approach, which is reminiscent of <a href=http://dilbert.com/strip/2008-05-07 target=_blank rel=noopener>the Dilbert strip where the pointy-haired boss asks Dilbert to average and multiply wrong data</a>.</p><figure><a href=dilbert-average-multiply-data.gif target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
diff --git a/2017/06/03/exploring-and-visualising-reef-life-survey-data/index.html b/2017/06/03/exploring-and-visualising-reef-life-survey-data/index.html
index 0e995818f..97ee1b9e5 100644
--- a/2017/06/03/exploring-and-visualising-reef-life-survey-data/index.html
+++ b/2017/06/03/exploring-and-visualising-reef-life-survey-data/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Exploring and visualising Reef Life Survey data | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data science,environment,JavaScript,marine science,Reef Life Survey,software engineering,web development"><meta name=description content="Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Exploring and visualising Reef Life Survey data"><meta property="og:description" content="Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/"><meta property="og:image" content="https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2017-06-03T00:49:05+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot.png"><meta name=twitter:title content="Exploring and visualising Reef Life Survey data"><meta name=twitter:description content="Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Exploring and visualising Reef Life Survey data","item":"https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Exploring and visualising Reef Life Survey data","name":"Exploring and visualising Reef Life Survey data","description":"Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.","keywords":["data science","environment","JavaScript","marine science","Reef Life Survey","software engineering","web development"],"articleBody":"Last year, I wrote about the Reef Life Survey (RLS) project and my experience with offline data collection on the Great Barrier Reef. I found that using auto-generated flashcards with an increasing level of difficulty is a good way to memorise marine species. Since publishing that post, I have improved the flashcards and built a tool for exploring the aggregate survey data. Both tools are now publicly available on the RLS website. This post describes the tools and their implementation, and outlines possible directions for future work.\nThe tools Each tool is fairly simple and focused on helping users achieve a small set of tasks. The best way to get familiar with the tools is to play with them by following the links below. If you’re only interested in using the tools, you can stop reading after this section. The rest of this post describes the data behind the tools, and some technical implementation details.\nThe Frequency Explorer tool lets users select RLS sites and view the species that have been recorded there (RLS website | full-screen version). The Flashcards tool helps users memorise the names of marine species by showing random images of species from a chosen area (RLS website | full-screen version). The data The RLS database includes data collected by volunteer scuba divers on the diversity and abundance of marine life in sites around the world. An RLS survey is performed along a 50 metre tape, which is laid at a constant depth following a reef’s contour. After laying the tape, one diver takes photos of the bottom at 2.5 metre intervals along the transect line. These photos are analysed later to classify the type of substrate or growth (e.g., hard coral or sand). Divers then complete two swims along each side of the transect. On the first swim (method 1), divers record all the fish species and large swimming animals found in a 5 metre corridor from the line. The second swim (method 2) targets invertebrates and cryptic animals, and requires keeping closer to the bottom and looking under ledges and vegetation in a 1 metre corridor from the line. The RLS manual includes all the details on how surveys are performed. The data collected in the surveys is available for download from a Data Portal hosted by the Institute for Marine and Antarctic Studies at the University of Tasmania. As of early June 2017, the downloadable dataset consists of over half a million data points from almost ten thousand surveys.\nWhen I first started studying marine species, I had to find a source for photos. Initially, I used Scrapy to build simple scrapers that downloaded photos from sites such as The Australian Museum, Fishbase, and Fishes of Australia. Last year, RLS made a large number of high-quality photos taken by volunteers available on their site (via the Species Search function). In addition to their high quality, an advantage of the RLS photos over images from other sources is that they were all taken in situ, i.e., in each animal’s natural habitat. On the other hand, other sites also include photos of dissections and hand-drawn illustrations, which aren’t as useful for divers who want to see marine animals as they appear in the wild. Working exclusively with the RLS image dataset has significantly improved the appearance and usefulness of the tools I built.\nThe raw RLS survey data comes in the form of over 100MB of CSV files. For the purpose of building the tools, I summarised the data into two JSON files with an overall size of less than 3MB (less than 1MB when compressed). This made it possible to implement both tools as single-page apps that don’t require any requests to the server after the initial fetching of the data. The two summary JSONs are:\nspecies.json – a mapping from species ID to an array of five elements: scientific name, common name, species page URL, survey method (0: method 1, 1: method 2, or 2: both), and images (array of URLs). site-surveys.json – a mapping from site code to an array of seven elements: realm, ecoregion, site name, longitude, latitude, number of surveys, and species counts (mapping from each observed species ID to the number of surveys on which it was seen). Both files use mappings to arrays rather than nested objects to reduce the download size. I originally created the files myself by downloading the CSVs from the data portal and scraping the RLS website for images and common names. Static versions of those files from early June 2017 can be found on GitHub (species.json and site-surveys.json). As part of the integration with the RLS website, the RLS developers will implement live versions of the files, which will get updated automatically. I’ll add the links to the live versions when they become available. Please let me or the RLS team know if you find any issues with the data.\nThe approach I chose to produce the species counts in site-surveys.json doesn’t take abundance into account, i.e., each species is counted once per survey regardless of the number of times it was seen on the survey. Ignoring abundance means that for sites with few surveys, the species count may not be a good indicator of future likelihood of occurrence. For example, some fish are solitary and seen rarely, while others occur in schools and are likely to be seen on every survey. However, this is less of an issue for sites with many surveys. In addition, this simple counting approach is easier to explain than some approaches that do account for abundance.\nImplementation details The source code for the tools can be found in my GitHub Pages repository. Each tool is a simple single-page application, consisting of three files: index.jade, main.coffee, and style.less. In addition, the root source directory contains some common code in common.less and util.coffee, as well as configuration files for npm and Grunt. Grunt is used to compile the source files from Jade/Pug, CoffeeScript, and Less to HTML, JS, and CSS respectively. These files are then served statically by GitHub Pages.\nThe common CoffeeScript code loads the JSONs asynchronously, and processes them into nested mappings that are easier to work with than arrays. In addition, the common code contains a method to summarise counts from multiple sites, by aggregating them as simple sums. This means that sites that are surveyed more frequently get weighted more heavily. For example, if a certain fish X was seen once in site A, twice in site B, and never in site C, its count across A, B, and C is 1 + 2 + 0 = 3, but if A was surveyed once, B was surveyed twice, and C was surveyed seven times, X’s aggregate frequency is 3 / (1 + 2 + 7) = 30%. In the future, it may be worth normalising each site’s species counts by the number of times the site was surveyed (making X’s aggregate frequency (1 / 1 + 2 / 2 + 0 / 7) / 3 = 66.67%), but then rare species in rarely-surveyed sites may be overweighted.\nThe Frequency Explorer tool uses the Google Maps API to show a map with all the past survey sites. Users can select sites by drawing an area on the map, or by searching for site names in a Select2 box. The tool fails gracefully when Google Maps isn’t available, which makes it possible to run it offline (assuming you have local copies of the species images). This was very useful on my last trip to the Coral Sea, where I was away from mobile reception for weeks. When sites are selected, the code generates a summary table of the species frequencies, which can be exported to a dynamically-generated CSV. In addition, users can choose to display images of all the species in the table. As this can trigger the download of thousands of images, I used vanilla-lazyload to only load images when they enter the viewport. Finally, Frequency Explorer can also be used as a site selector for the Flashcards tool, as it contains a link to launch Flashcards with the set of selected sites (which is passed in the Flashcards query string).\nThe Flashcards tool relies on the excellent reveal.js library to dynamically generate a presentation with a random subset of images of species that were recorded at the selected sites. The presentation consists of pairs of image and name slides – each image slide is followed by a slide where the name of the previously-shown animal is revealed. As I found that trying to memorise all the species at once is too hard, I added the ability to adjust the difficulty level of the flashcards by setting a frequency threshold (e.g., show only species that were recorded on 25% of surveys), or by focusing on observations from a single survey method (e.g., method 2 surveys in the tropics tend to be much less diverse than method 1 surveys). To avoid reloading the entire page when the settings change, the slides are regenerated dynamically. Reveal isn’t really built to account for dynamic regeneration of slides, so I had to add a call to Reveal.toggleOverview(false) to get the cards to refresh correctly, but other than that it worked perfectly.\nFuture work There are several possible extensions to the work done so far.\nFirst, the integration of the tools into the RLS website is incomplete. They are still served in iframes from my GitHub Pages account, and the JSON data isn’t updated automatically. Completing the integration is dependent on the RLS developers, who also have other priorities. Other RLS-dependent items include better optimisation of images (they’re currently scaled down on the client side), and general performance improvements to the site.\nSecond, the tools themselves could be improved. For example, reliance on third-party libraries should be reduced (e.g., Frequency Explorer uses Bootstrap due to my limited design skills), and it’d be nice if site selections were stored and read from the URL of Frequency Explorer (this is already done for Flashcards). In addition, as the tools are used to train new RLS divers, it’d be useful to extend the Flashcards tool to run in test mode, where users would type in the names of the animals rather than just passively scroll through the presentation. This would make it possible to assess diver readiness to perform surveys based on their test scores.\nFinally, many other interesting things can be done with the RLS data (in addition to producing scientific papers and reports, which is the main focus of the researchers behind the project). Examples include using the images to automate species identification (as discussed more thoroughly in my previous post on the topic), and building models to predict survey output and detect anomalies (e.g., due to climate change or other unusual factors). If you have other ideas, or end up playing with the data and coming with interesting results, please share your findings in the comments section.\n","wordCount":"1825","inLanguage":"en","image":"https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot.png","datePublished":"2017-06-03T00:49:05Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Exploring and visualising Reef Life Survey data</h1><div class=post-meta><span title='2017-06-03 00:49:05 +0000 UTC'>June 3, 2017</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot_hu373457bc2952799d7bbd8496305551d0_1306623_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot_hu373457bc2952799d7bbd8496305551d0_1306623_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot_hu373457bc2952799d7bbd8496305551d0_1306623_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot_hu373457bc2952799d7bbd8496305551d0_1306623_1080x0_resize_box_3.png 1080w ,https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot_hu373457bc2952799d7bbd8496305551d0_1306623_1500x0_resize_box_3.png 1500w ,https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot.png 3035w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot.png alt width=3035 height=1442></figure><div class=post-content><p>Last year, I wrote about the <a href=http://reeflifesurvey.com target=_blank rel=noopener>Reef Life Survey</a> (RLS) project and <a href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/>my experience with offline data collection on the Great Barrier Reef</a>. I found that using auto-generated flashcards with an increasing level of difficulty is a good way to memorise marine species. Since publishing that post, I have improved the flashcards and built a tool for exploring the aggregate survey data. Both tools are now publicly available on the RLS website. This post describes the tools and their implementation, and outlines possible directions for future work.</p><h2 id=the-tools>The tools<a hidden class=anchor aria-hidden=true href=#the-tools>#</a></h2><p>Each tool is fairly simple and focused on helping users achieve a small set of tasks. The best way to get familiar with the tools is to play with them by following the links below. If you&rsquo;re only interested in using the tools, you can stop reading after this section. The rest of this post describes the data behind the tools, and some technical implementation details.</p><figure><a href=reef-life-survey-frequency-explorer-screenshot.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="data science,environment,JavaScript,marine science,Reef Life Survey,software engineering,web development"><meta name=description content="Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Exploring and visualising Reef Life Survey data"><meta property="og:description" content="Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/"><meta property="og:image" content="https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2017-06-03T00:49:05+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot.png"><meta name=twitter:title content="Exploring and visualising Reef Life Survey data"><meta name=twitter:description content="Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Exploring and visualising Reef Life Survey data","item":"https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Exploring and visualising Reef Life Survey data","name":"Exploring and visualising Reef Life Survey data","description":"Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.","keywords":["data science","environment","JavaScript","marine science","Reef Life Survey","software engineering","web development"],"articleBody":"Last year, I wrote about the Reef Life Survey (RLS) project and my experience with offline data collection on the Great Barrier Reef. I found that using auto-generated flashcards with an increasing level of difficulty is a good way to memorise marine species. Since publishing that post, I have improved the flashcards and built a tool for exploring the aggregate survey data. Both tools are now publicly available on the RLS website. This post describes the tools and their implementation, and outlines possible directions for future work.\nThe tools Each tool is fairly simple and focused on helping users achieve a small set of tasks. The best way to get familiar with the tools is to play with them by following the links below. If you’re only interested in using the tools, you can stop reading after this section. The rest of this post describes the data behind the tools, and some technical implementation details.\nThe Frequency Explorer tool lets users select RLS sites and view the species that have been recorded there (RLS website | full-screen version). The Flashcards tool helps users memorise the names of marine species by showing random images of species from a chosen area (RLS website | full-screen version). The data The RLS database includes data collected by volunteer scuba divers on the diversity and abundance of marine life in sites around the world. An RLS survey is performed along a 50 metre tape, which is laid at a constant depth following a reef’s contour. After laying the tape, one diver takes photos of the bottom at 2.5 metre intervals along the transect line. These photos are analysed later to classify the type of substrate or growth (e.g., hard coral or sand). Divers then complete two swims along each side of the transect. On the first swim (method 1), divers record all the fish species and large swimming animals found in a 5 metre corridor from the line. The second swim (method 2) targets invertebrates and cryptic animals, and requires keeping closer to the bottom and looking under ledges and vegetation in a 1 metre corridor from the line. The RLS manual includes all the details on how surveys are performed. The data collected in the surveys is available for download from a Data Portal hosted by the Institute for Marine and Antarctic Studies at the University of Tasmania. As of early June 2017, the downloadable dataset consists of over half a million data points from almost ten thousand surveys.\nWhen I first started studying marine species, I had to find a source for photos. Initially, I used Scrapy to build simple scrapers that downloaded photos from sites such as The Australian Museum, Fishbase, and Fishes of Australia. Last year, RLS made a large number of high-quality photos taken by volunteers available on their site (via the Species Search function). In addition to their high quality, an advantage of the RLS photos over images from other sources is that they were all taken in situ, i.e., in each animal’s natural habitat. On the other hand, other sites also include photos of dissections and hand-drawn illustrations, which aren’t as useful for divers who want to see marine animals as they appear in the wild. Working exclusively with the RLS image dataset has significantly improved the appearance and usefulness of the tools I built.\nThe raw RLS survey data comes in the form of over 100MB of CSV files. For the purpose of building the tools, I summarised the data into two JSON files with an overall size of less than 3MB (less than 1MB when compressed). This made it possible to implement both tools as single-page apps that don’t require any requests to the server after the initial fetching of the data. The two summary JSONs are:\nspecies.json – a mapping from species ID to an array of five elements: scientific name, common name, species page URL, survey method (0: method 1, 1: method 2, or 2: both), and images (array of URLs). site-surveys.json – a mapping from site code to an array of seven elements: realm, ecoregion, site name, longitude, latitude, number of surveys, and species counts (mapping from each observed species ID to the number of surveys on which it was seen). Both files use mappings to arrays rather than nested objects to reduce the download size. I originally created the files myself by downloading the CSVs from the data portal and scraping the RLS website for images and common names. Static versions of those files from early June 2017 can be found on GitHub (species.json and site-surveys.json). As part of the integration with the RLS website, the RLS developers will implement live versions of the files, which will get updated automatically. I’ll add the links to the live versions when they become available. Please let me or the RLS team know if you find any issues with the data.\nThe approach I chose to produce the species counts in site-surveys.json doesn’t take abundance into account, i.e., each species is counted once per survey regardless of the number of times it was seen on the survey. Ignoring abundance means that for sites with few surveys, the species count may not be a good indicator of future likelihood of occurrence. For example, some fish are solitary and seen rarely, while others occur in schools and are likely to be seen on every survey. However, this is less of an issue for sites with many surveys. In addition, this simple counting approach is easier to explain than some approaches that do account for abundance.\nImplementation details The source code for the tools can be found in my GitHub Pages repository. Each tool is a simple single-page application, consisting of three files: index.jade, main.coffee, and style.less. In addition, the root source directory contains some common code in common.less and util.coffee, as well as configuration files for npm and Grunt. Grunt is used to compile the source files from Jade/Pug, CoffeeScript, and Less to HTML, JS, and CSS respectively. These files are then served statically by GitHub Pages.\nThe common CoffeeScript code loads the JSONs asynchronously, and processes them into nested mappings that are easier to work with than arrays. In addition, the common code contains a method to summarise counts from multiple sites, by aggregating them as simple sums. This means that sites that are surveyed more frequently get weighted more heavily. For example, if a certain fish X was seen once in site A, twice in site B, and never in site C, its count across A, B, and C is 1 + 2 + 0 = 3, but if A was surveyed once, B was surveyed twice, and C was surveyed seven times, X’s aggregate frequency is 3 / (1 + 2 + 7) = 30%. In the future, it may be worth normalising each site’s species counts by the number of times the site was surveyed (making X’s aggregate frequency (1 / 1 + 2 / 2 + 0 / 7) / 3 = 66.67%), but then rare species in rarely-surveyed sites may be overweighted.\nThe Frequency Explorer tool uses the Google Maps API to show a map with all the past survey sites. Users can select sites by drawing an area on the map, or by searching for site names in a Select2 box. The tool fails gracefully when Google Maps isn’t available, which makes it possible to run it offline (assuming you have local copies of the species images). This was very useful on my last trip to the Coral Sea, where I was away from mobile reception for weeks. When sites are selected, the code generates a summary table of the species frequencies, which can be exported to a dynamically-generated CSV. In addition, users can choose to display images of all the species in the table. As this can trigger the download of thousands of images, I used vanilla-lazyload to only load images when they enter the viewport. Finally, Frequency Explorer can also be used as a site selector for the Flashcards tool, as it contains a link to launch Flashcards with the set of selected sites (which is passed in the Flashcards query string).\nThe Flashcards tool relies on the excellent reveal.js library to dynamically generate a presentation with a random subset of images of species that were recorded at the selected sites. The presentation consists of pairs of image and name slides – each image slide is followed by a slide where the name of the previously-shown animal is revealed. As I found that trying to memorise all the species at once is too hard, I added the ability to adjust the difficulty level of the flashcards by setting a frequency threshold (e.g., show only species that were recorded on 25% of surveys), or by focusing on observations from a single survey method (e.g., method 2 surveys in the tropics tend to be much less diverse than method 1 surveys). To avoid reloading the entire page when the settings change, the slides are regenerated dynamically. Reveal isn’t really built to account for dynamic regeneration of slides, so I had to add a call to Reveal.toggleOverview(false) to get the cards to refresh correctly, but other than that it worked perfectly.\nFuture work There are several possible extensions to the work done so far.\nFirst, the integration of the tools into the RLS website is incomplete. They are still served in iframes from my GitHub Pages account, and the JSON data isn’t updated automatically. Completing the integration is dependent on the RLS developers, who also have other priorities. Other RLS-dependent items include better optimisation of images (they’re currently scaled down on the client side), and general performance improvements to the site.\nSecond, the tools themselves could be improved. For example, reliance on third-party libraries should be reduced (e.g., Frequency Explorer uses Bootstrap due to my limited design skills), and it’d be nice if site selections were stored and read from the URL of Frequency Explorer (this is already done for Flashcards). In addition, as the tools are used to train new RLS divers, it’d be useful to extend the Flashcards tool to run in test mode, where users would type in the names of the animals rather than just passively scroll through the presentation. This would make it possible to assess diver readiness to perform surveys based on their test scores.\nFinally, many other interesting things can be done with the RLS data (in addition to producing scientific papers and reports, which is the main focus of the researchers behind the project). Examples include using the images to automate species identification (as discussed more thoroughly in my previous post on the topic), and building models to predict survey output and detect anomalies (e.g., due to climate change or other unusual factors). If you have other ideas, or end up playing with the data and coming with interesting results, please share your findings in the comments section.\n","wordCount":"1825","inLanguage":"en","image":"https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot.png","datePublished":"2017-06-03T00:49:05Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Exploring and visualising Reef Life Survey data</h1><div class=post-meta><span title='2017-06-03 00:49:05 +0000 UTC'>June 3, 2017</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot_hu373457bc2952799d7bbd8496305551d0_1306623_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot_hu373457bc2952799d7bbd8496305551d0_1306623_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot_hu373457bc2952799d7bbd8496305551d0_1306623_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot_hu373457bc2952799d7bbd8496305551d0_1306623_1080x0_resize_box_3.png 1080w ,https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot_hu373457bc2952799d7bbd8496305551d0_1306623_1500x0_resize_box_3.png 1500w ,https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot.png 3035w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot.png alt width=3035 height=1442></figure><div class=post-content><p>Last year, I wrote about the <a href=http://reeflifesurvey.com target=_blank rel=noopener>Reef Life Survey</a> (RLS) project and <a href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/>my experience with offline data collection on the Great Barrier Reef</a>. I found that using auto-generated flashcards with an increasing level of difficulty is a good way to memorise marine species. Since publishing that post, I have improved the flashcards and built a tool for exploring the aggregate survey data. Both tools are now publicly available on the RLS website. This post describes the tools and their implementation, and outlines possible directions for future work.</p><h2 id=the-tools>The tools<a hidden class=anchor aria-hidden=true href=#the-tools>#</a></h2><p>Each tool is fairly simple and focused on helping users achieve a small set of tasks. The best way to get familiar with the tools is to play with them by following the links below. If you&rsquo;re only interested in using the tools, you can stop reading after this section. The rest of this post describes the data behind the tools, and some technical implementation details.</p><figure><a href=reef-life-survey-frequency-explorer-screenshot.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot_hu373457bc2952799d7bbd8496305551d0_1306623_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot_hu373457bc2952799d7bbd8496305551d0_1306623_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/reef-life-survey-frequency-explorer-screenshot_hu373457bc2952799d7bbd8496305551d0_1306623_720x0_resize_box_3.png 720w,
diff --git a/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/index.html b/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/index.html
index 1df379b0f..e773f26a2 100644
--- a/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/index.html
+++ b/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>My 10-step path to becoming a remote data scientist with Automattic | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="Automattic,career,data science,Elasticsearch,personal,WordPress"><meta name=description content="I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="My 10-step path to becoming a remote data scientist with Automattic"><meta property="og:description" content="I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/"><meta property="og:image" content="https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2017-07-29T05:39:26+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road.jpg"><meta name=twitter:title content="My 10-step path to becoming a remote data scientist with Automattic"><meta name=twitter:description content="I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"My 10-step path to becoming a remote data scientist with Automattic","item":"https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"My 10-step path to becoming a remote data scientist with Automattic","name":"My 10-step path to becoming a remote data scientist with Automattic","description":"I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.","keywords":["Automattic","career","data science","Elasticsearch","personal","WordPress"],"articleBody":"About two years ago, I read the book The Year without Pants, which describes the author’s experience leading a team at Automattic (the company behind WordPress.com, among other products). Automattic is a fully-distributed company, which means that all of its employees work remotely (hence pants are optional). While the book discusses some of the challenges of working remotely, the author’s general experience was very positive. A few months after reading the book, I decided to look for a full-time position after a period of independent work. Ideally, I wanted a well-paid data science-y remote job with an established distributed tech company that offers a good life balance and makes products I care about. Automattic seemed to tick all my boxes, so I decided to apply for a job with them. This post describes my application steps, which ultimately led to me becoming a data scientist with Automattic.\nBefore jumping in, it’s worth noting that this post describes my personal experience. If you apply for a job with Automattic, your experience is likely to be different, as the process varies across teams, and evolves over time.\n📧 Step 1: Do background research and apply I decided to apply for a data wrangler position with Automattic in October 2015. While data wrangler may sound less sexy than data scientist, reading the job ad led me to believe that the position may involve interesting data science work. This impression was strengthened by some LinkedIn stalking, which included finding current data wranglers and reading through their profiles and websites. I later found out that all the people on the data division start out as data wranglers, and then they may pick their own title. Some data wranglers do data science work, while others are more focused on data engineering, and there are some projects that require a broad range of skills. As the usefulness of the term data scientist is questionable, I’m not too fussed about fancy job titles. It’s more important to do interesting work in a supportive environment.\nApplying for the job was fairly straightforward. I simply followed the instructions from the ad:\nDoes this sound interesting? If yes, please send a short email to jobs @ this domain telling us about yourself and attach a resumé. Let us know what you can contribute to the team. Include the title of the position you’re applying for and your name in the subject. Proofread! Make sure you spell and capitalize WordPress and Automattic correctly. We are lucky to receive hundreds of applications for every position, so try to make your application stand out. If you apply for multiple positions or send multiple emails there will be one reply.\nHaving been on the receiving side of job applications, I find it surprising that many people don’t bother writing a cover letter, addressing the selection criteria in the ad, or even applying for a job they’re qualified to do. Hence, my cover letter was fairly short, comprising of several bullet points that highlight the similarities between the job requirements and my experience. It was nothing fancy, but simple cover letters have worked well for me in the past.\n⏳ Step 2: Wait patiently The initial application was followed by a long wait. From my research, this is the typical scenario. This is unsurprising, as Automattic is a fairly small company with a large footprint, which is both distributed and known as a great place to work (e.g., its Glassdoor rating is 4.9). Therefore, it attracts many applicants from all over the world, which take a while to process. In addition, Matt Mullenweg (Automattic’s CEO) reviews job applications before passing them on to the team leads.\nAs I didn’t know that Matt reviewed job applications, I decided to try to shorten the wait by getting introduced to someone in the data division. My first attempt was via a second-degree LinkedIn connection who works for Automattic. He responded quickly when I reached out to him, saying that his experience working with the company is in line with the Glassdoor reviews – it’s the best job he’s had in his 15-year-long career. However, he couldn’t help me with an intro, because there is no simple way around Automattic’s internal processes. Nonetheless, he reassured me that it is worth waiting patiently, as the strict process means that you end up working with great people.\nI wasn’t in a huge rush to find a job, but in December 2015 I decided to accept an offer to become the head of data science at Car Next Door. This was a good decision at the time, as I believe in the company’s original vision of reducing the number of cars on the road through car sharing, and it seemed like there would be many interesting projects for me to work on. The position wasn’t completely remote, but as the company was already spread across several cities, I was able to work from home for a day or two every week. In addition, it was a pleasant commute by bike from my Sydney home to the office, so putting the fully-remote job search on hold didn’t seem like a major sacrifice. As I haven’t heard anything from Automattic at that stage, it seemed unwise to reject a good offer, so I started working full-time with Car Next Door in January 2016.\nI successfully attracted Automattic’s attention with a post I published on the misuse of the word insights by many tech companies, which included an example from WordPress.com. Greg Ichneumon Brown, one of the data wranglers, commented on the post, and invited me to apply to join Automattic and help them address the issues I raised. This happened after I accepted the offer from Car Next Door, and hasn’t resulted in any speed up of the process, so I just gave up on Automattic and carried on with my life.\n💬 Step 3: Chat with the data lead I finally heard back from Automattic in February 2016 (four months after my initial application and a month into my employment with Car Next Door). Martin Remy, who leads the data division, emailed me to enquire if I’m still interested in the position. I informed him that I was no longer looking for a job, but we agreed to have an informal chat, as I’ve been waiting for such a long time.\nAs is often the case with Automattic interviews, the chat with Martin was completely text-based. Working with a distributed team means that voice and video calls can be hard to schedule. Hence, Automattic relies heavily on textual channels, and text-based interviews allow the company to test the written communication skills of candidates. The chat revolved around my past work experience, and Martin also took the time to answer my questions about the company and the data division. At the conclusion of the chat, Martin suggested I contact him directly if I was ever interested in continuing the application process. While I was happy with my position at the time, the chat strengthened my positive impression of Automattic, and I decided that I would reapply if I were to look for a full-time position again.\nMy next job search started earlier than I had anticipated. In October 2016, I decided to leave Car Next Door due to disagreements with the founders over the general direction of the company. In addition, I had more flexibility in choosing where to live, as my personal circumstances had changed. As I’ve always been curious about life outside the capital cities of Australia, I wanted to move away from Sydney. While I could have probably continued working remotely with Car Next Door, I felt that it would be better to find a job with a fully-distributed team. Therefore, I messaged Martin and we scheduled another chat.\nThe second chat with Martin took place in early November. Similarly to the first chat, it was conducted via Skype text messages, and revolved around my work in the time that has passed since the first chat. This time, as I was keen on continuing with the process, I asked more specific questions about what kind of work I’m likely to end up doing and what the next steps would be. The answers were that I’d be joining the data science team, and that the next steps are a pre-trial test, a paid trial, and a final interview with Matt. While this sounds straightforward, it took another six months until I finally became an Automattic employee (but I wasn’t in a rush).\n☑️ Step 4: Pass the pre-trial test The pre-trial test consisted of a data analysis task, where I was given a dataset and a set of questions to answer by Carly Stambaugh, the data science lead. The goal of the test is to evaluate the candidate’s approach to a problem, and assess organisational and communication skills. As such, the focus isn’t on obtaining a specific result, so candidates are given a choice of several potential avenues to explore. The open-ended nature of the task is reminiscent of many real-world data science projects, where you don’t always have a clear idea of what you’re going to discover. While some people may find this kind of uncertainty daunting, I find it interesting, as it is one of the things that makes data science a science.\nI spent a few days analysing the data and preparing a report, which was submitted as a Jupyter Notebook. After submitting my initial report, there were a few follow-up questions, which I answered by email. The report was reviewed by Carly and Martin, and as they were satisfied with my work, I was invited to proceed to the next stage: A paid trial project.\n👨‍💻 Step 5: Do the trial project The main part of the application process with Automattic is the paid trial project. The rationale behind doing paid trials was explained a few years ago by Matt in Hire by Auditions, Not Resumes:\nBefore we hire anyone, they go through a trial process first, on contract. They can do the work at night or over the weekend, so they don’t have to leave their current job in the meantime. We pay a standard rate of $25 per hour, regardless of whether you’re applying to be an engineer or the chief financial officer.\nDuring the trials, we give the applicants actual work. If you’re applying to work in customer support, you’ll answer tickets. If you’re an engineer, you’ll work on engineering problems. If you’re a designer, you’ll design.\nThere’s nothing like being in the trenches with someone, working with them day by day. It tells you something you can’t learn from resumes, interviews, or reference checks. At the end of the trial, everyone involved has a great sense of whether they want to work together going forward. And, yes, that means everyone — it’s a mutual tryout. Some people decide we’re not the right fit for them.\nThe goal of my trial project was to improve the Elasticsearch language detection algorithm. This took about a month, and ultimately resulted in a pull request that got merged into the language detection plugin. I find this aspect of the process pretty exciting: While the plugin is used to classify millions of documents internally by Automattic, its impact extends beyond the company, as Elasticsearch is used by many other organisations and projects. This stands in contrast to many other technical job interviews, which consist of unpaid work on toy problems under stressful conditions, where the work performed is ultimately thrown away. While the monetary compensation for the trial work is lower than the market rate for data science consulting, I valued the opportunity to work on a real open source project, even if this hadn’t led to me getting hired.\nThere was much more to the trial project than what’s shown in the final pull request. Most of the discussions were held on an internal project thread, primarly under the guidance of Carly (the data science lead), and Greg (the data wrangler who replied to my post a year earlier). The project was kicked off with a general problem statement: There was some evidence that the Elasticsearch language detection plugin doesn’t perform well on short texts, and my mission was to improve it. As the plugin didn’t include any tests for short texts, one of the main contributions of my work was the creation of datasets and tests to measure its accuracy on texts of different lengths. This was followed by some tweaks that improved the plugin’s performance, as summarised in the pull request. Internally, this work consisted of several iterations where I came up with ideas, asked questions, implemented the ideas, shared the results, and discussed further steps. There are still many possible improvements to the work done in the trial. However, as trials generally last around a month, we decided to end it after a few iterations.\nI enjoyed the trial process, but it is definitely not for everyone. Most notably, there is a strong emphasis on asynchronous text-based communication, which is the main mode by which projects are coordinated at Automattic. People who don’t enjoy written communication may find this aspect challenging, but I have always found that writing helps me organise my thoughts, and that I retain information better when reading than when listening to people speak. That being said, Automatticians do meet in person several times a year, and some teams have video chats for some discussions. While doing the trial, I had a video chat with Carly, which was the first (and last) time in the process that I got to see and hear a live human. However, this was not an essential part of the trial project, as our chat was mostly on the data scientist role and my job expectations.\n⏳ Step 6: Wait patiently I finished working on the trial project just before Christmas. The feedback I received throughout the trial was positive, but Martin, Carly, and Greg had to go through the work and discuss it among themselves before making a final decision. This took about a month, due to the holiday period, various personal circumstances, and the data science team meetup that was scheduled for January 2017. Eventually, Martin got back to me with positive news: They were satisfied with my trial work, which meant there was only one stage left – the final interview with Matt Mullenweg, Automattic’s CEO.\n👉 Step 7: Ping Matt Like other parts of the process, the interview with Matt is text-based. The way it works is fairly simple: I was instructed to message Matt on Slack and wait for a response, which may take days or weeks. I sent Matt a message on January 25, and was surprised to hear back from him the following morning. However, that day was Australia Day, which is a public holiday here. Therefore, I only got back to him two hours after he messaged me that morning, and by that time he was probably already busy with other things. This was the start of a pretty long wait.\n⏳ Step 8: Wait patiently I left Car Next Door at the end of January, as I figured that I would be able to line up some other work even if things didn’t work out with Automattic. My plan was to take some time off, and then move up to the Northern Rivers area of New South Wales. I had two Reef Life Survey trips planned, so I wasn’t going to start working again before mid-April. I assumed that I would hear back from Matt before then, which would have allowed me to make an informed decision whether to look for another job or not.\nAfter two weeks of waiting, the time for my dive trips was nearing. As I was going to be without mobile reception for a while, I thought it’d be worth letting Matt know my schedule. After discussing the matter with Martin, I messaged Matt. He responded, saying that we might as well do the interview at the beginning of April, as I won’t be starting work before that time anyway. I would have preferred to be done with the interview earlier, but was happy to have some certainty and not worry about missing more chat messages before April.\nIn early April, I returned from my second dive trip (which included a close encounter with Cyclone Debbie), and was hoping to sort out my remote work situation while completing the move up north. Unfortunately, while the move was successful, I was ready to give up on Automattic because I haven’t heard back from Matt at all in April. However, Martin remained optimistic and encouraged me to wait patiently, which I did as I was pretty busy with the move and with some casual freelancing projects.\n💬 Step 9: Chat with Matt and accept the job offer The chat with Matt finally happened on May 2. As is often the case, it took a few hours and covered my background, the trial process, and some other general questions. I asked him about my long wait for the final chat, and he apologised for me being an outlier, as most chats happen within two weeks of a candidate being passed over to him. As the chat was about to conclude, we got to the topic of salary negotiation (which went well), and then the process was finally over! Within a few hours of the chat I was sent an offer letter and an employment contract. As Automattic has an entity in Australia (called Ausomattic), it’s a fairly standard contract. I signed the contract and started work the following week – over a year and a half after my initial application. Even before I started working, I booked tickets to meet the data division in Montréal – a fairly swift transition from the long wait for the final interview.\n🎉 Step 10: Start working and choose a job title As noted above, Automatticians get to choose their own job titles, so to become a data scientist with Automattic, I had to set my job title to Data Scientist. This is generally how many people become data scientists these days, even outside Automattic. However, job titles don’t matter as much as job satisfaction. And after 2.5 months with Automattic, I’m very satisfied with my decision to join the company. My first three weeks were spent doing customer support, like all new Automattic employees. Since then, I’ve been involved in projects to make engagement measurement more consistent (harder than it sounds, as counting things is hard), and to improve the data science codebase (e.g., moving away from Legacy Python). Besides that, I also went to Montréal for the data division meetup, and have started getting into chatbot work. I’m looking forward to doing more work and sharing my experience here and on data.blog.\n","wordCount":"3143","inLanguage":"en","image":"https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road.jpg","datePublished":"2017-07-29T05:39:26Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">My 10-step path to becoming a remote data scientist with Automattic</h1><div class=post-meta><span title='2017-07-29 05:39:26 +0000 UTC'>July 29, 2017</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road_hu690f2353847db52b435aef42e177b9ac_842409_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road_hu690f2353847db52b435aef42e177b9ac_842409_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road_hu690f2353847db52b435aef42e177b9ac_842409_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road_hu690f2353847db52b435aef42e177b9ac_842409_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road_hu690f2353847db52b435aef42e177b9ac_842409_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road.jpg 2000w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road.jpg alt width=2000 height=1125></figure><div class=post-content><p>About two years ago, I read the book <a href=http://scottberkun.com/yearwithoutpants/ target=_blank rel=noopener>The Year without Pants</a>, which describes the author&rsquo;s experience leading a team at <a href=https://automattic.com/ target=_blank rel=noopener>Automattic</a> (the company behind WordPress.com, among other products). Automattic is a fully-distributed company, which means that all of its employees work remotely (hence pants are optional). While the book discusses some of the challenges of working remotely, the author&rsquo;s general experience was very positive. A few months after reading the book, I decided to look for a full-time position after <a href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/>a period of independent work</a>. Ideally, I wanted a well-paid data science-y remote job with an established distributed tech company that offers a good life balance and makes products I care about. Automattic seemed to tick all my boxes, so I decided to apply for a job with them. This post describes my application steps, which ultimately led to me becoming a data scientist with Automattic.</p><p>Before jumping in, it&rsquo;s worth noting that this post describes <em>my</em> personal experience. If you apply for a job with Automattic, your experience is likely to be different, as the process varies across teams, and evolves over time.</p><h2 id=-step-1-do-background-research-and-apply>📧 Step 1: Do background research and apply<a hidden class=anchor aria-hidden=true href=#-step-1-do-background-research-and-apply>#</a></h2><p>I decided to apply for a data wrangler position with Automattic in October 2015. While data <em>wrangler</em> may sound less sexy than data <em>scientist</em>, reading the <a href=http://web.archive.org/web/20150908140923/https://automattic.com/work-with-us/data-wrangler/ target=_blank rel=noopener>job ad</a> led me to believe that the position may involve interesting data science work. This impression was strengthened by some LinkedIn stalking, which included finding current data wranglers and reading through their profiles and websites. I later found out that all the people on the data division start out as data wranglers, and then they may pick their own title. Some data wranglers do data science work, while others are more focused on data engineering, and there are some projects that require a broad range of skills. As <a href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/>the usefulness of the term <em>data scientist</em> is questionable</a>, I&rsquo;m not too fussed about fancy job titles. It&rsquo;s more important to do interesting work in a supportive environment.</p><p>Applying for the job was fairly straightforward. I simply followed the instructions from the ad:</p><blockquote><p>Does this sound interesting? If yes, please send a short email to jobs @ this domain telling us about yourself and attach a resumé. Let us know what you can contribute to the team. Include the title of the position you&rsquo;re applying for and your name in the subject. Proofread! Make sure you spell and capitalize WordPress and Automattic correctly. We are lucky to receive hundreds of applications for every position, so try to make your application stand out. If you apply for multiple positions or send multiple emails there will be one reply.</p></blockquote><p>Having been on the receiving side of job applications, I find it surprising that many people don&rsquo;t bother writing a cover letter, addressing the selection criteria in the ad, or even applying for a job they&rsquo;re qualified to do. Hence, my cover letter was fairly short, comprising of several bullet points that highlight the similarities between the job requirements and my experience. It was nothing fancy, but simple cover letters have worked well for me in the past.</p><h2 id=-step-2-wait-patiently>⏳ Step 2: Wait patiently<a hidden class=anchor aria-hidden=true href=#-step-2-wait-patiently>#</a></h2><p>The initial application was followed by a long wait. From my research, this is the typical scenario. This is unsurprising, as <a href=https://automattic.com/about/ target=_blank rel=noopener>Automattic is a fairly small company with a large footprint</a>, which is both distributed and known as a great place to work (e.g., its <a href=https://www.glassdoor.com.au/Reviews/Automattic-Reviews-E751107.htm target=_blank rel=noopener>Glassdoor rating is 4.9</a>). Therefore, it attracts many applicants from all over the world, which take a while to process. In addition, <a href=http://davemart.in/remote-hiring/ target=_blank rel=noopener>Matt Mullenweg (Automattic&rsquo;s CEO) reviews job applications before passing them on to the team leads</a>.</p><p>As I didn&rsquo;t know that Matt reviewed job applications, I decided to try to shorten the wait by getting introduced to someone in the data division. My first attempt was via a second-degree LinkedIn connection who works for Automattic. He responded quickly when I reached out to him, saying that his experience working with the company is in line with the Glassdoor reviews – it&rsquo;s the best job he&rsquo;s had in his 15-year-long career. However, he couldn&rsquo;t help me with an intro, because there is no simple way around Automattic&rsquo;s internal processes. Nonetheless, he reassured me that it is worth waiting patiently, as the strict process means that you end up working with great people.</p><p>I wasn&rsquo;t in a huge rush to find a job, but in December 2015 I decided to accept an offer to become the head of data science at <a href=https://www.carnextdoor.com.au/ target=_blank rel=noopener>Car Next Door</a>. This was a good decision at the time, as I believe in the company&rsquo;s original vision of reducing the number of cars on the road through car sharing, and it seemed like there would be many interesting projects for me to work on. The position wasn&rsquo;t completely remote, but as the company was already spread across several cities, I was able to work from home for a day or two every week. In addition, it was a pleasant commute by bike from my Sydney home to the office, so putting the fully-remote job search on hold didn&rsquo;t seem like a major sacrifice. As I haven&rsquo;t heard anything from Automattic at that stage, it seemed unwise to reject a good offer, so I started working full-time with Car Next Door in January 2016.</p><p>I successfully attracted Automattic&rsquo;s attention with <a href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/>a post I published on the misuse of the word <em>insights</em> by many tech companies</a>, which included an example from WordPress.com. Greg Ichneumon Brown, one of the data wranglers, <a href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/#comment-957>commented on the post</a>, and invited me to apply to join Automattic and help them address the issues I raised. This happened after I accepted the offer from Car Next Door, and hasn&rsquo;t resulted in any speed up of the process, so I just gave up on Automattic and carried on with my life.</p><h2 id=-step-3-chat-with-the-data-lead>💬 Step 3: Chat with the data lead<a hidden class=anchor aria-hidden=true href=#-step-3-chat-with-the-data-lead>#</a></h2><p>I finally heard back from Automattic in February 2016 (four months after my initial application and a month into my employment with Car Next Door). Martin Remy, who leads the data division, emailed me to enquire if I&rsquo;m still interested in the position. I informed him that I was no longer looking for a job, but we agreed to have an informal chat, as I&rsquo;ve been waiting for such a long time.</p><p>As is often the case with Automattic interviews, the chat with Martin was completely text-based. Working with a distributed team means that voice and video calls can be hard to schedule. Hence, Automattic relies heavily on textual channels, and text-based interviews allow the company to test the written communication skills of candidates. The chat revolved around my past work experience, and Martin also took the time to answer my questions about the company and the data division. At the conclusion of the chat, Martin suggested I contact him directly if I was ever interested in continuing the application process. While I was happy with my position at the time, the chat strengthened my positive impression of Automattic, and I decided that I would reapply if I were to look for a full-time position again.</p><p>My next job search started earlier than I had anticipated. In October 2016, I decided to leave Car Next Door due to disagreements with the founders over the general direction of the company. In addition, I had more flexibility in choosing where to live, as my personal circumstances had changed. As I&rsquo;ve always been curious about life outside the capital cities of Australia, I wanted to move away from Sydney. While I could have probably continued working remotely with Car Next Door, I felt that it would be better to find a job with a fully-distributed team. Therefore, I messaged Martin and we scheduled another chat.</p><p>The second chat with Martin took place in early November. Similarly to the first chat, it was conducted via Skype text messages, and revolved around my work in the time that has passed since the first chat. This time, as I was keen on continuing with the process, I asked more specific questions about what kind of work I&rsquo;m likely to end up doing and what the next steps would be. The answers were that I&rsquo;d be joining the data science team, and that the next steps are a pre-trial test, a paid trial, and a final interview with Matt. While this sounds straightforward, it took another six months until I finally became an Automattic employee (but I wasn&rsquo;t in a rush).</p><h2 id=-step-4-pass-the-pre-trial-test>☑️ Step 4: Pass the pre-trial test<a hidden class=anchor aria-hidden=true href=#-step-4-pass-the-pre-trial-test>#</a></h2><p>The pre-trial test consisted of a data analysis task, where I was given a dataset and a set of questions to answer by Carly Stambaugh, the data science lead. The goal of the test is to evaluate the candidate&rsquo;s approach to a problem, and assess organisational and communication skills. As such, the focus isn&rsquo;t on obtaining a specific result, so candidates are given a choice of several potential avenues to explore. The open-ended nature of the task is reminiscent of many real-world data science projects, where you don&rsquo;t always have a clear idea of what you&rsquo;re going to discover. While some people may find this kind of uncertainty daunting, I find it interesting, as it is one of the things that makes data science a <em>science</em>.</p><p>I spent a few days analysing the data and preparing a report, which was submitted as a <a href=http://jupyter.org/ target=_blank rel=noopener>Jupyter Notebook</a>. After submitting my initial report, there were a few follow-up questions, which I answered by email. The report was reviewed by Carly and Martin, and as they were satisfied with my work, I was invited to proceed to the next stage: A paid trial project.</p><h2 id=-step-5-do-the-trial-project>👨‍💻 Step 5: Do the trial project<a hidden class=anchor aria-hidden=true href=#-step-5-do-the-trial-project>#</a></h2><p>The main part of the application process with Automattic is the paid trial project. The rationale behind doing paid trials was explained a few years ago by Matt in <a href=https://hbr.org/2014/01/hire-by-auditions-not-resumes target=_blank rel=noopener>Hire by Auditions, Not Resumes</a>:</p><blockquote><p>Before we hire anyone, they go through a trial process first, on contract. They can do the work at night or over the weekend, so they don&rsquo;t have to leave their current job in the meantime. We pay a standard rate of $25 per hour, regardless of whether you&rsquo;re applying to be an engineer or the chief financial officer.</p><p>During the trials, we give the applicants actual work. If you&rsquo;re applying to work in customer support, you&rsquo;ll answer tickets. If you&rsquo;re an engineer, you&rsquo;ll work on engineering problems. If you&rsquo;re a designer, you&rsquo;ll design.</p><p>There&rsquo;s nothing like being in the trenches with someone, working with them day by day. It tells you something you can&rsquo;t learn from resumes, interviews, or reference checks. At the end of the trial, everyone involved has a great sense of whether they want to work together going forward. And, yes, that means everyone — it&rsquo;s a mutual tryout. Some people decide we&rsquo;re not the right fit for them.</p></blockquote><p>The goal of my trial project was to improve the <a href=https://www.elastic.co/products/elasticsearch target=_blank rel=noopener>Elasticsearch</a> language detection algorithm. This took about a month, and ultimately resulted in <a href=https://github.com/jprante/elasticsearch-langdetect/pull/69 target=_blank rel=noopener>a pull request that got merged into the language detection plugin</a>. I find this aspect of the process pretty exciting: While the plugin is used to classify millions of documents internally by Automattic, its impact extends beyond the company, as Elasticsearch is used by many other organisations and projects. This stands in contrast to many other technical job interviews, which consist of unpaid work on toy problems under stressful conditions, where the work performed is ultimately thrown away. While the monetary compensation for the trial work is lower than the market rate for data science consulting, I valued the opportunity to work on a real open source project, even if this hadn&rsquo;t led to me getting hired.</p><p>There was much more to the trial project than what&rsquo;s shown in the final pull request. Most of the discussions were held on an internal project thread, primarly under the guidance of Carly (the data science lead), and Greg (the data wrangler who replied to my post a year earlier). The project was kicked off with a general problem statement: There was some evidence that the Elasticsearch language detection plugin doesn&rsquo;t perform well on short texts, and my mission was to improve it. As the plugin didn&rsquo;t include any tests for short texts, one of the main contributions of my work was the creation of datasets and tests to measure its accuracy on texts of different lengths. This was followed by some tweaks that improved the plugin&rsquo;s performance, as <a href=https://github.com/jprante/elasticsearch-langdetect/pull/69 target=_blank rel=noopener>summarised in the pull request</a>. Internally, this work consisted of several iterations where I came up with ideas, asked questions, implemented the ideas, shared the results, and discussed further steps. There are still many possible improvements to the work done in the trial. However, as trials generally last around a month, we decided to end it after a few iterations.</p><p>I enjoyed the trial process, but it is definitely not for everyone. Most notably, there is a strong emphasis on asynchronous text-based communication, which is the main mode by which projects are coordinated at Automattic. People who don&rsquo;t enjoy written communication may find this aspect challenging, but I have always found that writing helps me organise my thoughts, and that I retain information better when reading than when listening to people speak. That being said, Automatticians do meet in person several times a year, and some teams have video chats for some discussions. While doing the trial, I had a video chat with Carly, which was the first (and last) time in the process that I got to see and hear a live human. However, this was not an essential part of the trial project, as our chat was mostly on the data scientist role and my job expectations.</p><h2 id=-step-6-wait-patiently>⏳ Step 6: Wait patiently<a hidden class=anchor aria-hidden=true href=#-step-6-wait-patiently>#</a></h2><p>I finished working on the trial project just before Christmas. The feedback I received throughout the trial was positive, but Martin, Carly, and Greg had to go through the work and discuss it among themselves before making a final decision. This took about a month, due to the holiday period, various personal circumstances, and the data science team meetup that was scheduled for January 2017. Eventually, Martin got back to me with positive news: They were satisfied with my trial work, which meant there was only one stage left – the final interview with Matt Mullenweg, Automattic&rsquo;s CEO.</p><h2 id=-step-7-ping-matt>👉 Step 7: Ping Matt<a hidden class=anchor aria-hidden=true href=#-step-7-ping-matt>#</a></h2><p>Like other parts of the process, the interview with Matt is text-based. The way it works is fairly simple: I was instructed to message Matt on Slack and wait for a response, which may take days or weeks. I sent Matt a message on January 25, and was surprised to hear back from him the following morning. However, that day was Australia Day, which is a public holiday here. Therefore, I only got back to him two hours after he messaged me that morning, and by that time he was probably already busy with other things. This was the start of a pretty long wait.</p><h2 id=-step-8-wait-patiently>⏳ Step 8: Wait patiently<a hidden class=anchor aria-hidden=true href=#-step-8-wait-patiently>#</a></h2><p>I left Car Next Door at the end of January, as I figured that I would be able to line up some other work even if things didn&rsquo;t work out with Automattic. My plan was to take some time off, and then move up to the Northern Rivers area of New South Wales. I had two <a href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/>Reef Life Survey trips</a> planned, so I wasn&rsquo;t going to start working again before mid-April. I assumed that I would hear back from Matt before then, which would have allowed me to make an informed decision whether to look for another job or not.</p><p>After two weeks of waiting, the time for my dive trips was nearing. As I was going to be without mobile reception for a while, I thought it&rsquo;d be worth letting Matt know my schedule. After discussing the matter with Martin, I messaged Matt. He responded, saying that we might as well do the interview at the beginning of April, as I won&rsquo;t be starting work before that time anyway. I would have preferred to be done with the interview earlier, but was happy to have some certainty and not worry about missing more chat messages before April.</p><p>In early April, I returned from my second dive trip (which included <a href=https://www.whitsundaytimes.com.au/news/boat-caught-in-eye-of-cyclone-cruises-home/3164170/ target=_blank rel=noopener>a close encounter with Cyclone Debbie</a>), and was hoping to sort out my remote work situation while completing the move up north. Unfortunately, while the move was successful, I was ready to give up on Automattic because I haven&rsquo;t heard back from Matt at all in April. However, Martin remained optimistic and encouraged me to wait patiently, which I did as I was pretty busy with the move and with some casual freelancing projects.</p><h2 id=-step-9-chat-with-matt-and-accept-the-job-offer>💬 Step 9: Chat with Matt and accept the job offer<a hidden class=anchor aria-hidden=true href=#-step-9-chat-with-matt-and-accept-the-job-offer>#</a></h2><p>The chat with Matt finally happened on May 2. As is often the case, it took a few hours and covered my background, the trial process, and some other general questions. I asked him about my long wait for the final chat, and he apologised for me being an outlier, as most chats happen within two weeks of a candidate being passed over to him. As the chat was about to conclude, we got to the topic of salary negotiation (which went well), and then the process was finally over! Within a few hours of the chat I was sent an offer letter and an employment contract. As Automattic has an entity in Australia (called Ausomattic), it&rsquo;s a fairly standard contract. I signed the contract and started work the following week – over a year and a half after my initial application. Even before I started working, I booked tickets to <a href=https://data.blog/2017/06/29/data-coalesce-automattic-data-division-meets-in-montreal/ target=_blank rel=noopener>meet the data division in Montréal</a> – a fairly swift transition from the long wait for the final interview.</p><h2 id=-step-10-start-working-and-choose-a-job-title>🎉 Step 10: Start working and choose a job title<a hidden class=anchor aria-hidden=true href=#-step-10-start-working-and-choose-a-job-title>#</a></h2><p>As noted above, Automatticians get to choose their own job titles, so to become a data scientist with Automattic, I had to set my job title to Data Scientist. This is generally how many people become data scientists these days, even outside Automattic. However, job titles don&rsquo;t matter as much as job satisfaction. And after 2.5 months with Automattic, I&rsquo;m very satisfied with my decision to join the company. My first three weeks were spent doing customer support, like all new Automattic employees. Since then, I&rsquo;ve been involved in projects to make engagement measurement more consistent (harder than it sounds, as <a href=http://daynebatten.com/2016/06/counting-hard-data-science/ target=_blank rel=noopener>counting things is hard</a>), and to improve the data science codebase (e.g., moving away from <a href=http://powerfulpython.com/blog/magic-word-legacy-python/ target=_blank rel=noopener>Legacy Python</a>). Besides that, I also went to Montréal for the data division meetup, and have started getting into <a href=https://data.blog/2017/05/24/may-the-bot-be-with-you-how-algorithms-are-supporting-happiness-at-wordpress-com/ target=_blank rel=noopener>chatbot work</a>. I&rsquo;m looking forward to doing more work and sharing my experience here and on <a href=https://data.blog/ target=_blank rel=noopener>data.blog</a>.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/automattic/>Automattic</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/elasticsearch/>Elasticsearch</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li><li><a href=https://yanirseroussi.com/tags/wordpress/>WordPress</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share My 10-step path to becoming a remote data scientist with Automattic on x" href="https://x.com/intent/tweet/?text=My%2010-step%20path%20to%20becoming%20a%20remote%20data%20scientist%20with%20Automattic&amp;url=https%3a%2f%2fyanirseroussi.com%2f2017%2f07%2f29%2fmy-10-step-path-to-becoming-a-remote-data-scientist-with-automattic%2f&amp;hashtags=Automattic%2ccareer%2cdatascience%2cElasticsearch%2cpersonal%2cWordPress"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My 10-step path to becoming a remote data scientist with Automattic on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2017%2f07%2f29%2fmy-10-step-path-to-becoming-a-remote-data-scientist-with-automattic%2f&amp;title=My%2010-step%20path%20to%20becoming%20a%20remote%20data%20scientist%20with%20Automattic&amp;summary=My%2010-step%20path%20to%20becoming%20a%20remote%20data%20scientist%20with%20Automattic&amp;source=https%3a%2f%2fyanirseroussi.com%2f2017%2f07%2f29%2fmy-10-step-path-to-becoming-a-remote-data-scientist-with-automattic%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My 10-step path to becoming a remote data scientist with Automattic on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2017%2f07%2f29%2fmy-10-step-path-to-becoming-a-remote-data-scientist-with-automattic%2f&title=My%2010-step%20path%20to%20becoming%20a%20remote%20data%20scientist%20with%20Automattic"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My 10-step path to becoming a remote data scientist with Automattic on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2017%2f07%2f29%2fmy-10-step-path-to-becoming-a-remote-data-scientist-with-automattic%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My 10-step path to becoming a remote data scientist with Automattic on whatsapp" href="https://api.whatsapp.com/send?text=My%2010-step%20path%20to%20becoming%20a%20remote%20data%20scientist%20with%20Automattic%20-%20https%3a%2f%2fyanirseroussi.com%2f2017%2f07%2f29%2fmy-10-step-path-to-becoming-a-remote-data-scientist-with-automattic%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My 10-step path to becoming a remote data scientist with Automattic on telegram" href="https://telegram.me/share/url?text=My%2010-step%20path%20to%20becoming%20a%20remote%20data%20scientist%20with%20Automattic&amp;url=https%3a%2f%2fyanirseroussi.com%2f2017%2f07%2f29%2fmy-10-step-path-to-becoming-a-remote-data-scientist-with-automattic%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My 10-step path to becoming a remote data scientist with Automattic on ycombinator" href="https://news.ycombinator.com/submitlink?t=My%2010-step%20path%20to%20becoming%20a%20remote%20data%20scientist%20with%20Automattic&u=https%3a%2f%2fyanirseroussi.com%2f2017%2f07%2f29%2fmy-10-step-path-to-becoming-a-remote-data-scientist-with-automattic%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="Automattic,career,data science,Elasticsearch,personal,WordPress"><meta name=description content="I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="My 10-step path to becoming a remote data scientist with Automattic"><meta property="og:description" content="I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/"><meta property="og:image" content="https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2017-07-29T05:39:26+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road.jpg"><meta name=twitter:title content="My 10-step path to becoming a remote data scientist with Automattic"><meta name=twitter:description content="I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"My 10-step path to becoming a remote data scientist with Automattic","item":"https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"My 10-step path to becoming a remote data scientist with Automattic","name":"My 10-step path to becoming a remote data scientist with Automattic","description":"I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.","keywords":["Automattic","career","data science","Elasticsearch","personal","WordPress"],"articleBody":"About two years ago, I read the book The Year without Pants, which describes the author’s experience leading a team at Automattic (the company behind WordPress.com, among other products). Automattic is a fully-distributed company, which means that all of its employees work remotely (hence pants are optional). While the book discusses some of the challenges of working remotely, the author’s general experience was very positive. A few months after reading the book, I decided to look for a full-time position after a period of independent work. Ideally, I wanted a well-paid data science-y remote job with an established distributed tech company that offers a good life balance and makes products I care about. Automattic seemed to tick all my boxes, so I decided to apply for a job with them. This post describes my application steps, which ultimately led to me becoming a data scientist with Automattic.\nBefore jumping in, it’s worth noting that this post describes my personal experience. If you apply for a job with Automattic, your experience is likely to be different, as the process varies across teams, and evolves over time.\n📧 Step 1: Do background research and apply I decided to apply for a data wrangler position with Automattic in October 2015. While data wrangler may sound less sexy than data scientist, reading the job ad led me to believe that the position may involve interesting data science work. This impression was strengthened by some LinkedIn stalking, which included finding current data wranglers and reading through their profiles and websites. I later found out that all the people on the data division start out as data wranglers, and then they may pick their own title. Some data wranglers do data science work, while others are more focused on data engineering, and there are some projects that require a broad range of skills. As the usefulness of the term data scientist is questionable, I’m not too fussed about fancy job titles. It’s more important to do interesting work in a supportive environment.\nApplying for the job was fairly straightforward. I simply followed the instructions from the ad:\nDoes this sound interesting? If yes, please send a short email to jobs @ this domain telling us about yourself and attach a resumé. Let us know what you can contribute to the team. Include the title of the position you’re applying for and your name in the subject. Proofread! Make sure you spell and capitalize WordPress and Automattic correctly. We are lucky to receive hundreds of applications for every position, so try to make your application stand out. If you apply for multiple positions or send multiple emails there will be one reply.\nHaving been on the receiving side of job applications, I find it surprising that many people don’t bother writing a cover letter, addressing the selection criteria in the ad, or even applying for a job they’re qualified to do. Hence, my cover letter was fairly short, comprising of several bullet points that highlight the similarities between the job requirements and my experience. It was nothing fancy, but simple cover letters have worked well for me in the past.\n⏳ Step 2: Wait patiently The initial application was followed by a long wait. From my research, this is the typical scenario. This is unsurprising, as Automattic is a fairly small company with a large footprint, which is both distributed and known as a great place to work (e.g., its Glassdoor rating is 4.9). Therefore, it attracts many applicants from all over the world, which take a while to process. In addition, Matt Mullenweg (Automattic’s CEO) reviews job applications before passing them on to the team leads.\nAs I didn’t know that Matt reviewed job applications, I decided to try to shorten the wait by getting introduced to someone in the data division. My first attempt was via a second-degree LinkedIn connection who works for Automattic. He responded quickly when I reached out to him, saying that his experience working with the company is in line with the Glassdoor reviews – it’s the best job he’s had in his 15-year-long career. However, he couldn’t help me with an intro, because there is no simple way around Automattic’s internal processes. Nonetheless, he reassured me that it is worth waiting patiently, as the strict process means that you end up working with great people.\nI wasn’t in a huge rush to find a job, but in December 2015 I decided to accept an offer to become the head of data science at Car Next Door. This was a good decision at the time, as I believe in the company’s original vision of reducing the number of cars on the road through car sharing, and it seemed like there would be many interesting projects for me to work on. The position wasn’t completely remote, but as the company was already spread across several cities, I was able to work from home for a day or two every week. In addition, it was a pleasant commute by bike from my Sydney home to the office, so putting the fully-remote job search on hold didn’t seem like a major sacrifice. As I haven’t heard anything from Automattic at that stage, it seemed unwise to reject a good offer, so I started working full-time with Car Next Door in January 2016.\nI successfully attracted Automattic’s attention with a post I published on the misuse of the word insights by many tech companies, which included an example from WordPress.com. Greg Ichneumon Brown, one of the data wranglers, commented on the post, and invited me to apply to join Automattic and help them address the issues I raised. This happened after I accepted the offer from Car Next Door, and hasn’t resulted in any speed up of the process, so I just gave up on Automattic and carried on with my life.\n💬 Step 3: Chat with the data lead I finally heard back from Automattic in February 2016 (four months after my initial application and a month into my employment with Car Next Door). Martin Remy, who leads the data division, emailed me to enquire if I’m still interested in the position. I informed him that I was no longer looking for a job, but we agreed to have an informal chat, as I’ve been waiting for such a long time.\nAs is often the case with Automattic interviews, the chat with Martin was completely text-based. Working with a distributed team means that voice and video calls can be hard to schedule. Hence, Automattic relies heavily on textual channels, and text-based interviews allow the company to test the written communication skills of candidates. The chat revolved around my past work experience, and Martin also took the time to answer my questions about the company and the data division. At the conclusion of the chat, Martin suggested I contact him directly if I was ever interested in continuing the application process. While I was happy with my position at the time, the chat strengthened my positive impression of Automattic, and I decided that I would reapply if I were to look for a full-time position again.\nMy next job search started earlier than I had anticipated. In October 2016, I decided to leave Car Next Door due to disagreements with the founders over the general direction of the company. In addition, I had more flexibility in choosing where to live, as my personal circumstances had changed. As I’ve always been curious about life outside the capital cities of Australia, I wanted to move away from Sydney. While I could have probably continued working remotely with Car Next Door, I felt that it would be better to find a job with a fully-distributed team. Therefore, I messaged Martin and we scheduled another chat.\nThe second chat with Martin took place in early November. Similarly to the first chat, it was conducted via Skype text messages, and revolved around my work in the time that has passed since the first chat. This time, as I was keen on continuing with the process, I asked more specific questions about what kind of work I’m likely to end up doing and what the next steps would be. The answers were that I’d be joining the data science team, and that the next steps are a pre-trial test, a paid trial, and a final interview with Matt. While this sounds straightforward, it took another six months until I finally became an Automattic employee (but I wasn’t in a rush).\n☑️ Step 4: Pass the pre-trial test The pre-trial test consisted of a data analysis task, where I was given a dataset and a set of questions to answer by Carly Stambaugh, the data science lead. The goal of the test is to evaluate the candidate’s approach to a problem, and assess organisational and communication skills. As such, the focus isn’t on obtaining a specific result, so candidates are given a choice of several potential avenues to explore. The open-ended nature of the task is reminiscent of many real-world data science projects, where you don’t always have a clear idea of what you’re going to discover. While some people may find this kind of uncertainty daunting, I find it interesting, as it is one of the things that makes data science a science.\nI spent a few days analysing the data and preparing a report, which was submitted as a Jupyter Notebook. After submitting my initial report, there were a few follow-up questions, which I answered by email. The report was reviewed by Carly and Martin, and as they were satisfied with my work, I was invited to proceed to the next stage: A paid trial project.\n👨‍💻 Step 5: Do the trial project The main part of the application process with Automattic is the paid trial project. The rationale behind doing paid trials was explained a few years ago by Matt in Hire by Auditions, Not Resumes:\nBefore we hire anyone, they go through a trial process first, on contract. They can do the work at night or over the weekend, so they don’t have to leave their current job in the meantime. We pay a standard rate of $25 per hour, regardless of whether you’re applying to be an engineer or the chief financial officer.\nDuring the trials, we give the applicants actual work. If you’re applying to work in customer support, you’ll answer tickets. If you’re an engineer, you’ll work on engineering problems. If you’re a designer, you’ll design.\nThere’s nothing like being in the trenches with someone, working with them day by day. It tells you something you can’t learn from resumes, interviews, or reference checks. At the end of the trial, everyone involved has a great sense of whether they want to work together going forward. And, yes, that means everyone — it’s a mutual tryout. Some people decide we’re not the right fit for them.\nThe goal of my trial project was to improve the Elasticsearch language detection algorithm. This took about a month, and ultimately resulted in a pull request that got merged into the language detection plugin. I find this aspect of the process pretty exciting: While the plugin is used to classify millions of documents internally by Automattic, its impact extends beyond the company, as Elasticsearch is used by many other organisations and projects. This stands in contrast to many other technical job interviews, which consist of unpaid work on toy problems under stressful conditions, where the work performed is ultimately thrown away. While the monetary compensation for the trial work is lower than the market rate for data science consulting, I valued the opportunity to work on a real open source project, even if this hadn’t led to me getting hired.\nThere was much more to the trial project than what’s shown in the final pull request. Most of the discussions were held on an internal project thread, primarly under the guidance of Carly (the data science lead), and Greg (the data wrangler who replied to my post a year earlier). The project was kicked off with a general problem statement: There was some evidence that the Elasticsearch language detection plugin doesn’t perform well on short texts, and my mission was to improve it. As the plugin didn’t include any tests for short texts, one of the main contributions of my work was the creation of datasets and tests to measure its accuracy on texts of different lengths. This was followed by some tweaks that improved the plugin’s performance, as summarised in the pull request. Internally, this work consisted of several iterations where I came up with ideas, asked questions, implemented the ideas, shared the results, and discussed further steps. There are still many possible improvements to the work done in the trial. However, as trials generally last around a month, we decided to end it after a few iterations.\nI enjoyed the trial process, but it is definitely not for everyone. Most notably, there is a strong emphasis on asynchronous text-based communication, which is the main mode by which projects are coordinated at Automattic. People who don’t enjoy written communication may find this aspect challenging, but I have always found that writing helps me organise my thoughts, and that I retain information better when reading than when listening to people speak. That being said, Automatticians do meet in person several times a year, and some teams have video chats for some discussions. While doing the trial, I had a video chat with Carly, which was the first (and last) time in the process that I got to see and hear a live human. However, this was not an essential part of the trial project, as our chat was mostly on the data scientist role and my job expectations.\n⏳ Step 6: Wait patiently I finished working on the trial project just before Christmas. The feedback I received throughout the trial was positive, but Martin, Carly, and Greg had to go through the work and discuss it among themselves before making a final decision. This took about a month, due to the holiday period, various personal circumstances, and the data science team meetup that was scheduled for January 2017. Eventually, Martin got back to me with positive news: They were satisfied with my trial work, which meant there was only one stage left – the final interview with Matt Mullenweg, Automattic’s CEO.\n👉 Step 7: Ping Matt Like other parts of the process, the interview with Matt is text-based. The way it works is fairly simple: I was instructed to message Matt on Slack and wait for a response, which may take days or weeks. I sent Matt a message on January 25, and was surprised to hear back from him the following morning. However, that day was Australia Day, which is a public holiday here. Therefore, I only got back to him two hours after he messaged me that morning, and by that time he was probably already busy with other things. This was the start of a pretty long wait.\n⏳ Step 8: Wait patiently I left Car Next Door at the end of January, as I figured that I would be able to line up some other work even if things didn’t work out with Automattic. My plan was to take some time off, and then move up to the Northern Rivers area of New South Wales. I had two Reef Life Survey trips planned, so I wasn’t going to start working again before mid-April. I assumed that I would hear back from Matt before then, which would have allowed me to make an informed decision whether to look for another job or not.\nAfter two weeks of waiting, the time for my dive trips was nearing. As I was going to be without mobile reception for a while, I thought it’d be worth letting Matt know my schedule. After discussing the matter with Martin, I messaged Matt. He responded, saying that we might as well do the interview at the beginning of April, as I won’t be starting work before that time anyway. I would have preferred to be done with the interview earlier, but was happy to have some certainty and not worry about missing more chat messages before April.\nIn early April, I returned from my second dive trip (which included a close encounter with Cyclone Debbie), and was hoping to sort out my remote work situation while completing the move up north. Unfortunately, while the move was successful, I was ready to give up on Automattic because I haven’t heard back from Matt at all in April. However, Martin remained optimistic and encouraged me to wait patiently, which I did as I was pretty busy with the move and with some casual freelancing projects.\n💬 Step 9: Chat with Matt and accept the job offer The chat with Matt finally happened on May 2. As is often the case, it took a few hours and covered my background, the trial process, and some other general questions. I asked him about my long wait for the final chat, and he apologised for me being an outlier, as most chats happen within two weeks of a candidate being passed over to him. As the chat was about to conclude, we got to the topic of salary negotiation (which went well), and then the process was finally over! Within a few hours of the chat I was sent an offer letter and an employment contract. As Automattic has an entity in Australia (called Ausomattic), it’s a fairly standard contract. I signed the contract and started work the following week – over a year and a half after my initial application. Even before I started working, I booked tickets to meet the data division in Montréal – a fairly swift transition from the long wait for the final interview.\n🎉 Step 10: Start working and choose a job title As noted above, Automatticians get to choose their own job titles, so to become a data scientist with Automattic, I had to set my job title to Data Scientist. This is generally how many people become data scientists these days, even outside Automattic. However, job titles don’t matter as much as job satisfaction. And after 2.5 months with Automattic, I’m very satisfied with my decision to join the company. My first three weeks were spent doing customer support, like all new Automattic employees. Since then, I’ve been involved in projects to make engagement measurement more consistent (harder than it sounds, as counting things is hard), and to improve the data science codebase (e.g., moving away from Legacy Python). Besides that, I also went to Montréal for the data division meetup, and have started getting into chatbot work. I’m looking forward to doing more work and sharing my experience here and on data.blog.\n","wordCount":"3143","inLanguage":"en","image":"https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road.jpg","datePublished":"2017-07-29T05:39:26Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">My 10-step path to becoming a remote data scientist with Automattic</h1><div class=post-meta><span title='2017-07-29 05:39:26 +0000 UTC'>July 29, 2017</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road_hu690f2353847db52b435aef42e177b9ac_842409_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road_hu690f2353847db52b435aef42e177b9ac_842409_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road_hu690f2353847db52b435aef42e177b9ac_842409_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road_hu690f2353847db52b435aef42e177b9ac_842409_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road_hu690f2353847db52b435aef42e177b9ac_842409_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road.jpg 2000w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/long-remote-road.jpg alt width=2000 height=1125></figure><div class=post-content><p>About two years ago, I read the book <a href=http://scottberkun.com/yearwithoutpants/ target=_blank rel=noopener>The Year without Pants</a>, which describes the author&rsquo;s experience leading a team at <a href=https://automattic.com/ target=_blank rel=noopener>Automattic</a> (the company behind WordPress.com, among other products). Automattic is a fully-distributed company, which means that all of its employees work remotely (hence pants are optional). While the book discusses some of the challenges of working remotely, the author&rsquo;s general experience was very positive. A few months after reading the book, I decided to look for a full-time position after <a href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/>a period of independent work</a>. Ideally, I wanted a well-paid data science-y remote job with an established distributed tech company that offers a good life balance and makes products I care about. Automattic seemed to tick all my boxes, so I decided to apply for a job with them. This post describes my application steps, which ultimately led to me becoming a data scientist with Automattic.</p><p>Before jumping in, it&rsquo;s worth noting that this post describes <em>my</em> personal experience. If you apply for a job with Automattic, your experience is likely to be different, as the process varies across teams, and evolves over time.</p><h2 id=-step-1-do-background-research-and-apply>📧 Step 1: Do background research and apply<a hidden class=anchor aria-hidden=true href=#-step-1-do-background-research-and-apply>#</a></h2><p>I decided to apply for a data wrangler position with Automattic in October 2015. While data <em>wrangler</em> may sound less sexy than data <em>scientist</em>, reading the <a href=http://web.archive.org/web/20150908140923/https://automattic.com/work-with-us/data-wrangler/ target=_blank rel=noopener>job ad</a> led me to believe that the position may involve interesting data science work. This impression was strengthened by some LinkedIn stalking, which included finding current data wranglers and reading through their profiles and websites. I later found out that all the people on the data division start out as data wranglers, and then they may pick their own title. Some data wranglers do data science work, while others are more focused on data engineering, and there are some projects that require a broad range of skills. As <a href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/>the usefulness of the term <em>data scientist</em> is questionable</a>, I&rsquo;m not too fussed about fancy job titles. It&rsquo;s more important to do interesting work in a supportive environment.</p><p>Applying for the job was fairly straightforward. I simply followed the instructions from the ad:</p><blockquote><p>Does this sound interesting? If yes, please send a short email to jobs @ this domain telling us about yourself and attach a resumé. Let us know what you can contribute to the team. Include the title of the position you&rsquo;re applying for and your name in the subject. Proofread! Make sure you spell and capitalize WordPress and Automattic correctly. We are lucky to receive hundreds of applications for every position, so try to make your application stand out. If you apply for multiple positions or send multiple emails there will be one reply.</p></blockquote><p>Having been on the receiving side of job applications, I find it surprising that many people don&rsquo;t bother writing a cover letter, addressing the selection criteria in the ad, or even applying for a job they&rsquo;re qualified to do. Hence, my cover letter was fairly short, comprising of several bullet points that highlight the similarities between the job requirements and my experience. It was nothing fancy, but simple cover letters have worked well for me in the past.</p><h2 id=-step-2-wait-patiently>⏳ Step 2: Wait patiently<a hidden class=anchor aria-hidden=true href=#-step-2-wait-patiently>#</a></h2><p>The initial application was followed by a long wait. From my research, this is the typical scenario. This is unsurprising, as <a href=https://automattic.com/about/ target=_blank rel=noopener>Automattic is a fairly small company with a large footprint</a>, which is both distributed and known as a great place to work (e.g., its <a href=https://www.glassdoor.com.au/Reviews/Automattic-Reviews-E751107.htm target=_blank rel=noopener>Glassdoor rating is 4.9</a>). Therefore, it attracts many applicants from all over the world, which take a while to process. In addition, <a href=http://davemart.in/remote-hiring/ target=_blank rel=noopener>Matt Mullenweg (Automattic&rsquo;s CEO) reviews job applications before passing them on to the team leads</a>.</p><p>As I didn&rsquo;t know that Matt reviewed job applications, I decided to try to shorten the wait by getting introduced to someone in the data division. My first attempt was via a second-degree LinkedIn connection who works for Automattic. He responded quickly when I reached out to him, saying that his experience working with the company is in line with the Glassdoor reviews – it&rsquo;s the best job he&rsquo;s had in his 15-year-long career. However, he couldn&rsquo;t help me with an intro, because there is no simple way around Automattic&rsquo;s internal processes. Nonetheless, he reassured me that it is worth waiting patiently, as the strict process means that you end up working with great people.</p><p>I wasn&rsquo;t in a huge rush to find a job, but in December 2015 I decided to accept an offer to become the head of data science at <a href=https://www.carnextdoor.com.au/ target=_blank rel=noopener>Car Next Door</a>. This was a good decision at the time, as I believe in the company&rsquo;s original vision of reducing the number of cars on the road through car sharing, and it seemed like there would be many interesting projects for me to work on. The position wasn&rsquo;t completely remote, but as the company was already spread across several cities, I was able to work from home for a day or two every week. In addition, it was a pleasant commute by bike from my Sydney home to the office, so putting the fully-remote job search on hold didn&rsquo;t seem like a major sacrifice. As I haven&rsquo;t heard anything from Automattic at that stage, it seemed unwise to reject a good offer, so I started working full-time with Car Next Door in January 2016.</p><p>I successfully attracted Automattic&rsquo;s attention with <a href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/>a post I published on the misuse of the word <em>insights</em> by many tech companies</a>, which included an example from WordPress.com. Greg Ichneumon Brown, one of the data wranglers, <a href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/#comment-957>commented on the post</a>, and invited me to apply to join Automattic and help them address the issues I raised. This happened after I accepted the offer from Car Next Door, and hasn&rsquo;t resulted in any speed up of the process, so I just gave up on Automattic and carried on with my life.</p><h2 id=-step-3-chat-with-the-data-lead>💬 Step 3: Chat with the data lead<a hidden class=anchor aria-hidden=true href=#-step-3-chat-with-the-data-lead>#</a></h2><p>I finally heard back from Automattic in February 2016 (four months after my initial application and a month into my employment with Car Next Door). Martin Remy, who leads the data division, emailed me to enquire if I&rsquo;m still interested in the position. I informed him that I was no longer looking for a job, but we agreed to have an informal chat, as I&rsquo;ve been waiting for such a long time.</p><p>As is often the case with Automattic interviews, the chat with Martin was completely text-based. Working with a distributed team means that voice and video calls can be hard to schedule. Hence, Automattic relies heavily on textual channels, and text-based interviews allow the company to test the written communication skills of candidates. The chat revolved around my past work experience, and Martin also took the time to answer my questions about the company and the data division. At the conclusion of the chat, Martin suggested I contact him directly if I was ever interested in continuing the application process. While I was happy with my position at the time, the chat strengthened my positive impression of Automattic, and I decided that I would reapply if I were to look for a full-time position again.</p><p>My next job search started earlier than I had anticipated. In October 2016, I decided to leave Car Next Door due to disagreements with the founders over the general direction of the company. In addition, I had more flexibility in choosing where to live, as my personal circumstances had changed. As I&rsquo;ve always been curious about life outside the capital cities of Australia, I wanted to move away from Sydney. While I could have probably continued working remotely with Car Next Door, I felt that it would be better to find a job with a fully-distributed team. Therefore, I messaged Martin and we scheduled another chat.</p><p>The second chat with Martin took place in early November. Similarly to the first chat, it was conducted via Skype text messages, and revolved around my work in the time that has passed since the first chat. This time, as I was keen on continuing with the process, I asked more specific questions about what kind of work I&rsquo;m likely to end up doing and what the next steps would be. The answers were that I&rsquo;d be joining the data science team, and that the next steps are a pre-trial test, a paid trial, and a final interview with Matt. While this sounds straightforward, it took another six months until I finally became an Automattic employee (but I wasn&rsquo;t in a rush).</p><h2 id=-step-4-pass-the-pre-trial-test>☑️ Step 4: Pass the pre-trial test<a hidden class=anchor aria-hidden=true href=#-step-4-pass-the-pre-trial-test>#</a></h2><p>The pre-trial test consisted of a data analysis task, where I was given a dataset and a set of questions to answer by Carly Stambaugh, the data science lead. The goal of the test is to evaluate the candidate&rsquo;s approach to a problem, and assess organisational and communication skills. As such, the focus isn&rsquo;t on obtaining a specific result, so candidates are given a choice of several potential avenues to explore. The open-ended nature of the task is reminiscent of many real-world data science projects, where you don&rsquo;t always have a clear idea of what you&rsquo;re going to discover. While some people may find this kind of uncertainty daunting, I find it interesting, as it is one of the things that makes data science a <em>science</em>.</p><p>I spent a few days analysing the data and preparing a report, which was submitted as a <a href=http://jupyter.org/ target=_blank rel=noopener>Jupyter Notebook</a>. After submitting my initial report, there were a few follow-up questions, which I answered by email. The report was reviewed by Carly and Martin, and as they were satisfied with my work, I was invited to proceed to the next stage: A paid trial project.</p><h2 id=-step-5-do-the-trial-project>👨‍💻 Step 5: Do the trial project<a hidden class=anchor aria-hidden=true href=#-step-5-do-the-trial-project>#</a></h2><p>The main part of the application process with Automattic is the paid trial project. The rationale behind doing paid trials was explained a few years ago by Matt in <a href=https://hbr.org/2014/01/hire-by-auditions-not-resumes target=_blank rel=noopener>Hire by Auditions, Not Resumes</a>:</p><blockquote><p>Before we hire anyone, they go through a trial process first, on contract. They can do the work at night or over the weekend, so they don&rsquo;t have to leave their current job in the meantime. We pay a standard rate of $25 per hour, regardless of whether you&rsquo;re applying to be an engineer or the chief financial officer.</p><p>During the trials, we give the applicants actual work. If you&rsquo;re applying to work in customer support, you&rsquo;ll answer tickets. If you&rsquo;re an engineer, you&rsquo;ll work on engineering problems. If you&rsquo;re a designer, you&rsquo;ll design.</p><p>There&rsquo;s nothing like being in the trenches with someone, working with them day by day. It tells you something you can&rsquo;t learn from resumes, interviews, or reference checks. At the end of the trial, everyone involved has a great sense of whether they want to work together going forward. And, yes, that means everyone — it&rsquo;s a mutual tryout. Some people decide we&rsquo;re not the right fit for them.</p></blockquote><p>The goal of my trial project was to improve the <a href=https://www.elastic.co/products/elasticsearch target=_blank rel=noopener>Elasticsearch</a> language detection algorithm. This took about a month, and ultimately resulted in <a href=https://github.com/jprante/elasticsearch-langdetect/pull/69 target=_blank rel=noopener>a pull request that got merged into the language detection plugin</a>. I find this aspect of the process pretty exciting: While the plugin is used to classify millions of documents internally by Automattic, its impact extends beyond the company, as Elasticsearch is used by many other organisations and projects. This stands in contrast to many other technical job interviews, which consist of unpaid work on toy problems under stressful conditions, where the work performed is ultimately thrown away. While the monetary compensation for the trial work is lower than the market rate for data science consulting, I valued the opportunity to work on a real open source project, even if this hadn&rsquo;t led to me getting hired.</p><p>There was much more to the trial project than what&rsquo;s shown in the final pull request. Most of the discussions were held on an internal project thread, primarly under the guidance of Carly (the data science lead), and Greg (the data wrangler who replied to my post a year earlier). The project was kicked off with a general problem statement: There was some evidence that the Elasticsearch language detection plugin doesn&rsquo;t perform well on short texts, and my mission was to improve it. As the plugin didn&rsquo;t include any tests for short texts, one of the main contributions of my work was the creation of datasets and tests to measure its accuracy on texts of different lengths. This was followed by some tweaks that improved the plugin&rsquo;s performance, as <a href=https://github.com/jprante/elasticsearch-langdetect/pull/69 target=_blank rel=noopener>summarised in the pull request</a>. Internally, this work consisted of several iterations where I came up with ideas, asked questions, implemented the ideas, shared the results, and discussed further steps. There are still many possible improvements to the work done in the trial. However, as trials generally last around a month, we decided to end it after a few iterations.</p><p>I enjoyed the trial process, but it is definitely not for everyone. Most notably, there is a strong emphasis on asynchronous text-based communication, which is the main mode by which projects are coordinated at Automattic. People who don&rsquo;t enjoy written communication may find this aspect challenging, but I have always found that writing helps me organise my thoughts, and that I retain information better when reading than when listening to people speak. That being said, Automatticians do meet in person several times a year, and some teams have video chats for some discussions. While doing the trial, I had a video chat with Carly, which was the first (and last) time in the process that I got to see and hear a live human. However, this was not an essential part of the trial project, as our chat was mostly on the data scientist role and my job expectations.</p><h2 id=-step-6-wait-patiently>⏳ Step 6: Wait patiently<a hidden class=anchor aria-hidden=true href=#-step-6-wait-patiently>#</a></h2><p>I finished working on the trial project just before Christmas. The feedback I received throughout the trial was positive, but Martin, Carly, and Greg had to go through the work and discuss it among themselves before making a final decision. This took about a month, due to the holiday period, various personal circumstances, and the data science team meetup that was scheduled for January 2017. Eventually, Martin got back to me with positive news: They were satisfied with my trial work, which meant there was only one stage left – the final interview with Matt Mullenweg, Automattic&rsquo;s CEO.</p><h2 id=-step-7-ping-matt>👉 Step 7: Ping Matt<a hidden class=anchor aria-hidden=true href=#-step-7-ping-matt>#</a></h2><p>Like other parts of the process, the interview with Matt is text-based. The way it works is fairly simple: I was instructed to message Matt on Slack and wait for a response, which may take days or weeks. I sent Matt a message on January 25, and was surprised to hear back from him the following morning. However, that day was Australia Day, which is a public holiday here. Therefore, I only got back to him two hours after he messaged me that morning, and by that time he was probably already busy with other things. This was the start of a pretty long wait.</p><h2 id=-step-8-wait-patiently>⏳ Step 8: Wait patiently<a hidden class=anchor aria-hidden=true href=#-step-8-wait-patiently>#</a></h2><p>I left Car Next Door at the end of January, as I figured that I would be able to line up some other work even if things didn&rsquo;t work out with Automattic. My plan was to take some time off, and then move up to the Northern Rivers area of New South Wales. I had two <a href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/>Reef Life Survey trips</a> planned, so I wasn&rsquo;t going to start working again before mid-April. I assumed that I would hear back from Matt before then, which would have allowed me to make an informed decision whether to look for another job or not.</p><p>After two weeks of waiting, the time for my dive trips was nearing. As I was going to be without mobile reception for a while, I thought it&rsquo;d be worth letting Matt know my schedule. After discussing the matter with Martin, I messaged Matt. He responded, saying that we might as well do the interview at the beginning of April, as I won&rsquo;t be starting work before that time anyway. I would have preferred to be done with the interview earlier, but was happy to have some certainty and not worry about missing more chat messages before April.</p><p>In early April, I returned from my second dive trip (which included <a href=https://www.whitsundaytimes.com.au/news/boat-caught-in-eye-of-cyclone-cruises-home/3164170/ target=_blank rel=noopener>a close encounter with Cyclone Debbie</a>), and was hoping to sort out my remote work situation while completing the move up north. Unfortunately, while the move was successful, I was ready to give up on Automattic because I haven&rsquo;t heard back from Matt at all in April. However, Martin remained optimistic and encouraged me to wait patiently, which I did as I was pretty busy with the move and with some casual freelancing projects.</p><h2 id=-step-9-chat-with-matt-and-accept-the-job-offer>💬 Step 9: Chat with Matt and accept the job offer<a hidden class=anchor aria-hidden=true href=#-step-9-chat-with-matt-and-accept-the-job-offer>#</a></h2><p>The chat with Matt finally happened on May 2. As is often the case, it took a few hours and covered my background, the trial process, and some other general questions. I asked him about my long wait for the final chat, and he apologised for me being an outlier, as most chats happen within two weeks of a candidate being passed over to him. As the chat was about to conclude, we got to the topic of salary negotiation (which went well), and then the process was finally over! Within a few hours of the chat I was sent an offer letter and an employment contract. As Automattic has an entity in Australia (called Ausomattic), it&rsquo;s a fairly standard contract. I signed the contract and started work the following week – over a year and a half after my initial application. Even before I started working, I booked tickets to <a href=https://data.blog/2017/06/29/data-coalesce-automattic-data-division-meets-in-montreal/ target=_blank rel=noopener>meet the data division in Montréal</a> – a fairly swift transition from the long wait for the final interview.</p><h2 id=-step-10-start-working-and-choose-a-job-title>🎉 Step 10: Start working and choose a job title<a hidden class=anchor aria-hidden=true href=#-step-10-start-working-and-choose-a-job-title>#</a></h2><p>As noted above, Automatticians get to choose their own job titles, so to become a data scientist with Automattic, I had to set my job title to Data Scientist. This is generally how many people become data scientists these days, even outside Automattic. However, job titles don&rsquo;t matter as much as job satisfaction. And after 2.5 months with Automattic, I&rsquo;m very satisfied with my decision to join the company. My first three weeks were spent doing customer support, like all new Automattic employees. Since then, I&rsquo;ve been involved in projects to make engagement measurement more consistent (harder than it sounds, as <a href=http://daynebatten.com/2016/06/counting-hard-data-science/ target=_blank rel=noopener>counting things is hard</a>), and to improve the data science codebase (e.g., moving away from <a href=http://powerfulpython.com/blog/magic-word-legacy-python/ target=_blank rel=noopener>Legacy Python</a>). Besides that, I also went to Montréal for the data division meetup, and have started getting into <a href=https://data.blog/2017/05/24/may-the-bot-be-with-you-how-algorithms-are-supporting-happiness-at-wordpress-com/ target=_blank rel=noopener>chatbot work</a>. I&rsquo;m looking forward to doing more work and sharing my experience here and on <a href=https://data.blog/ target=_blank rel=noopener>data.blog</a>.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/automattic/>Automattic</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/elasticsearch/>Elasticsearch</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li><li><a href=https://yanirseroussi.com/tags/wordpress/>WordPress</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share My 10-step path to becoming a remote data scientist with Automattic on x" href="https://x.com/intent/tweet/?text=My%2010-step%20path%20to%20becoming%20a%20remote%20data%20scientist%20with%20Automattic&amp;url=https%3a%2f%2fyanirseroussi.com%2f2017%2f07%2f29%2fmy-10-step-path-to-becoming-a-remote-data-scientist-with-automattic%2f&amp;hashtags=Automattic%2ccareer%2cdatascience%2cElasticsearch%2cpersonal%2cWordPress"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My 10-step path to becoming a remote data scientist with Automattic on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2017%2f07%2f29%2fmy-10-step-path-to-becoming-a-remote-data-scientist-with-automattic%2f&amp;title=My%2010-step%20path%20to%20becoming%20a%20remote%20data%20scientist%20with%20Automattic&amp;summary=My%2010-step%20path%20to%20becoming%20a%20remote%20data%20scientist%20with%20Automattic&amp;source=https%3a%2f%2fyanirseroussi.com%2f2017%2f07%2f29%2fmy-10-step-path-to-becoming-a-remote-data-scientist-with-automattic%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My 10-step path to becoming a remote data scientist with Automattic on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2017%2f07%2f29%2fmy-10-step-path-to-becoming-a-remote-data-scientist-with-automattic%2f&title=My%2010-step%20path%20to%20becoming%20a%20remote%20data%20scientist%20with%20Automattic"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My 10-step path to becoming a remote data scientist with Automattic on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2017%2f07%2f29%2fmy-10-step-path-to-becoming-a-remote-data-scientist-with-automattic%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My 10-step path to becoming a remote data scientist with Automattic on whatsapp" href="https://api.whatsapp.com/send?text=My%2010-step%20path%20to%20becoming%20a%20remote%20data%20scientist%20with%20Automattic%20-%20https%3a%2f%2fyanirseroussi.com%2f2017%2f07%2f29%2fmy-10-step-path-to-becoming-a-remote-data-scientist-with-automattic%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My 10-step path to becoming a remote data scientist with Automattic on telegram" href="https://telegram.me/share/url?text=My%2010-step%20path%20to%20becoming%20a%20remote%20data%20scientist%20with%20Automattic&amp;url=https%3a%2f%2fyanirseroussi.com%2f2017%2f07%2f29%2fmy-10-step-path-to-becoming-a-remote-data-scientist-with-automattic%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My 10-step path to becoming a remote data scientist with Automattic on ycombinator" href="https://news.ycombinator.com/submitlink?t=My%2010-step%20path%20to%20becoming%20a%20remote%20data%20scientist%20with%20Automattic&u=https%3a%2f%2fyanirseroussi.com%2f2017%2f07%2f29%2fmy-10-step-path-to-becoming-a-remote-data-scientist-with-automattic%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2017/09/02/state-of-bandcamp-recommender/index.html b/2017/09/02/state-of-bandcamp-recommender/index.html
index cace06669..648919617 100644
--- a/2017/09/02/state-of-bandcamp-recommender/index.html
+++ b/2017/09/02/state-of-bandcamp-recommender/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>State of Bandcamp Recommender, Late 2017 | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="Bandcamp,BCRecommender"><meta name=description content="Call for BCRecommender maintainers followed by a decision to shut it down, as I don&rsquo;t have enough time and Bandcamp now offers recommendations."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="State of Bandcamp Recommender, Late 2017"><meta property="og:description" content="Call for BCRecommender maintainers followed by a decision to shut it down, as I don&rsquo;t have enough time and Bandcamp now offers recommendations."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/"><meta property="og:image" content="https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2017-09-02T10:19:02+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage.jpg"><meta name=twitter:title content="State of Bandcamp Recommender, Late 2017"><meta name=twitter:description content="Call for BCRecommender maintainers followed by a decision to shut it down, as I don&rsquo;t have enough time and Bandcamp now offers recommendations."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"State of Bandcamp Recommender, Late 2017","item":"https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"State of Bandcamp Recommender, Late 2017","name":"State of Bandcamp Recommender, Late 2017","description":"Call for BCRecommender maintainers followed by a decision to shut it down, as I don\u0026rsquo;t have enough time and Bandcamp now offers recommendations.","keywords":["Bandcamp","BCRecommender"],"articleBody":"November 2017: Update and goodbye I’ve decided to shut down Bandcamp Recommender (BCRecommender), despite hearing back from a few volunteers. The main reasons are:\nBandcamp now shows album recommendations at the bottom of album pages. While this isn’t quite the same as BCRecommender, I hope that it will evolve to a more comprehensive recommender system. I tried to contact Bandcamp to get their support for the continued running of BCRecommender. I have not heard back from them. It would have been nice to receive some acknowledgement that they find BCRecommender useful. As discussed below, I don’t have much time to spend on the project, and handing it off to other maintainers would have been time-consuming. Given reasons 1 and 2, I don’t feel like it’s worth the effort. Thanks to everyone who’s contacted me – you’re awesome! September 2017: Original announcement I released the first version of Bandcamp Recommender (BCRecommender) about three years ago, with the main goal of surfacing music recommendations from Bandcamp. A secondary goal was learning more about building and marketing a standalone web app. As such, I shared a few posts about BCRecommender over the years:\nInitial posts on the motivation behind building BCRecommender, original system layout, and the recommendation engine. Marketing-oriented posts on applying the Traction Book’s framework to BCRecommender, followed by an update on traction successes and failures, and another post on finding some SEO success. Later architectural changes, including moving away from Parse.com and migrating from MongoDB to Elasticsearch. The last of the above posts was published in November 2015 – almost two years ago. Most of the work on BCRecommender was done up to that point, when my main focus was on part-time contracting while working on my own projects. However, since January 2016 I’ve mostly been working full-time, so I haven’t had the time to give enough attention to the project. Therefore, it looks like it’s time for me to say goodbye to BCRecommender.\nDespite the lack of attention, about 5,000 people still visit BCRecommender every month (down from a peak of around 9,000). I know that people find it useful, even though it hasn’t been functionally updated in a long time (though the recommendations have been refreshed a few times). In an ideal world, BCRecommender would be replaced by algorithmic recommendations from Bandcamp. But unfortunately, Bandcamp still doesn’t offer personalised recommendations. This is a shame, because such recommendations could be of great benefit to both artists and fans. Millions of tracks and albums have been published on Bandcamp, meaning that serving personalised recommendations that cover their full catalogue can only be achieved using algorithms. However, it seems like they’re not interested in building this kind of functionality.\nRather than simply pulling the plug on BCRecommender, I thought I’d put a call out to see if anyone is interested in maintaining it. I’m happy to open source the code and hand the project over to someone else if it means it would be in good hands. With a little bit of work, BCRecommender can be turned into a full Bandcamp-based personalised radio station. If you think you’d be a good fit for maintaining the project, drop me a line and we can discuss further. If you just love BCRecommender, you can also let Bandcamp know that you want them to implement algorithmic recommendations (e.g., on Twitter or by emailing support@bandcamp.com). I’ll keep BCRecommender alive for about two more months and see if I get any responses. Either way, I’ll be saying goodbye to maintaining it before the end of the year.\n","wordCount":"590","inLanguage":"en","image":"https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage.jpg","datePublished":"2017-09-02T10:19:02Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">State of Bandcamp Recommender, Late 2017</h1><div class=post-meta><span title='2017-09-02 10:19:02 +0000 UTC'>September 2, 2017</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage_hu4bb4a83eb29302b814ecd8f57b6ac5b4_276563_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage_hu4bb4a83eb29302b814ecd8f57b6ac5b4_276563_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage_hu4bb4a83eb29302b814ecd8f57b6ac5b4_276563_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage_hu4bb4a83eb29302b814ecd8f57b6ac5b4_276563_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage_hu4bb4a83eb29302b814ecd8f57b6ac5b4_276563_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage.jpg 3738w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage.jpg alt width=3738 height=1127></figure><div class=post-content><h2 id=november-2017-update-and-goodbye>November 2017: Update and goodbye<a hidden class=anchor aria-hidden=true href=#november-2017-update-and-goodbye>#</a></h2><p>I&rsquo;ve decided to shut down Bandcamp Recommender (BCRecommender), despite hearing back from a few volunteers. The main reasons are:</p><ol><li>Bandcamp now shows album recommendations at the bottom of album pages. While this isn&rsquo;t quite the same as BCRecommender, I hope that it will evolve to a more comprehensive recommender system.</li><li>I tried to contact Bandcamp to get their support for the continued running of BCRecommender. I have not heard back from them. It would have been nice to receive some acknowledgement that they find BCRecommender useful.</li><li>As discussed below, I don&rsquo;t have much time to spend on the project, and handing it off to other maintainers would have been time-consuming. Given reasons 1 and 2, I don&rsquo;t feel like it&rsquo;s worth the effort. Thanks to everyone who&rsquo;s contacted me – you&rsquo;re awesome!</li></ol><h2 id=september-2017-original-announcement>September 2017: Original announcement<a hidden class=anchor aria-hidden=true href=#september-2017-original-announcement>#</a></h2><p>I <a href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/>released the first version of Bandcamp Recommender (BCRecommender) about three years ago</a>, with the main goal of surfacing music recommendations from <a href=https://bandcamp.com/ target=_blank rel=noopener>Bandcamp</a>. A secondary goal was learning more about building and marketing a standalone web app. As such, I shared a few posts about BCRecommender over the years:</p><ul><li>Initial posts on <a href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/>the motivation behind building BCRecommender</a>, <a href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/>original system layout</a>, and <a href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/>the recommendation engine</a>.</li><li>Marketing-oriented posts on <a href=https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/>applying the Traction Book&rsquo;s framework to BCRecommender</a>, followed by <a href=https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/>an update on traction successes and failures</a>, and another post on <a href=https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/>finding some SEO success</a>.</li><li>Later architectural changes, including <a href=https://yanirseroussi.com/2015/07/31/goodbye-parse-com/>moving away from Parse.com</a> and <a href=https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/>migrating from MongoDB to Elasticsearch</a>.</li></ul><p>The last of the above posts was published in November 2015 – almost two years ago. Most of the work on BCRecommender was done up to that point, when my main focus was on part-time contracting while working on my own projects. However, since January 2016 I&rsquo;ve mostly been working full-time, so I haven&rsquo;t had the time to give enough attention to the project. Therefore, it looks like it&rsquo;s time for me to say goodbye to BCRecommender.</p><p>Despite the lack of attention, about 5,000 people still visit BCRecommender every month (down from a peak of around 9,000). I know that people find it useful, even though it hasn&rsquo;t been functionally updated in a long time (though the recommendations have been refreshed a few times). In an ideal world, BCRecommender would be replaced by algorithmic recommendations from Bandcamp. But unfortunately, Bandcamp still doesn&rsquo;t offer personalised recommendations. This is a shame, because such recommendations could be of great benefit to both artists and fans. Millions of tracks and albums have been published on Bandcamp, meaning that serving personalised recommendations that cover their full catalogue can only be achieved using algorithms. However, it seems like they&rsquo;re not interested in building this kind of functionality.</p><p>Rather than simply pulling the plug on BCRecommender, I thought I&rsquo;d put a call out to see if anyone is interested in maintaining it. I&rsquo;m happy to open source the code and hand the project over to someone else if it means it would be in good hands. With a little bit of work, BCRecommender can be turned into a full Bandcamp-based personalised radio station. If you think you&rsquo;d be a good fit for maintaining the project, <a href=https://yanirseroussi.com/about/>drop me a line</a> and we can discuss further. If you just love BCRecommender, you can also let Bandcamp know that you want them to implement algorithmic recommendations (e.g., on <a href=https://twitter.com/bandcamp target=_blank rel=noopener>Twitter</a> or by emailing <a href=mailto:support@bandcamp.com>support@bandcamp.com</a>). I&rsquo;ll keep BCRecommender alive for about two more months and see if I get any responses. Either way, I&rsquo;ll be saying goodbye to maintaining it before the end of the year.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/bandcamp/>Bandcamp</a></li><li><a href=https://yanirseroussi.com/tags/bcrecommender/>BCRecommender</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share State of Bandcamp Recommender, Late 2017 on x" href="https://x.com/intent/tweet/?text=State%20of%20Bandcamp%20Recommender%2c%20Late%202017&amp;url=https%3a%2f%2fyanirseroussi.com%2f2017%2f09%2f02%2fstate-of-bandcamp-recommender%2f&amp;hashtags=Bandcamp%2cBCRecommender"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share State of Bandcamp Recommender, Late 2017 on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2017%2f09%2f02%2fstate-of-bandcamp-recommender%2f&amp;title=State%20of%20Bandcamp%20Recommender%2c%20Late%202017&amp;summary=State%20of%20Bandcamp%20Recommender%2c%20Late%202017&amp;source=https%3a%2f%2fyanirseroussi.com%2f2017%2f09%2f02%2fstate-of-bandcamp-recommender%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share State of Bandcamp Recommender, Late 2017 on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2017%2f09%2f02%2fstate-of-bandcamp-recommender%2f&title=State%20of%20Bandcamp%20Recommender%2c%20Late%202017"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share State of Bandcamp Recommender, Late 2017 on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2017%2f09%2f02%2fstate-of-bandcamp-recommender%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share State of Bandcamp Recommender, Late 2017 on whatsapp" href="https://api.whatsapp.com/send?text=State%20of%20Bandcamp%20Recommender%2c%20Late%202017%20-%20https%3a%2f%2fyanirseroussi.com%2f2017%2f09%2f02%2fstate-of-bandcamp-recommender%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share State of Bandcamp Recommender, Late 2017 on telegram" href="https://telegram.me/share/url?text=State%20of%20Bandcamp%20Recommender%2c%20Late%202017&amp;url=https%3a%2f%2fyanirseroussi.com%2f2017%2f09%2f02%2fstate-of-bandcamp-recommender%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share State of Bandcamp Recommender, Late 2017 on ycombinator" href="https://news.ycombinator.com/submitlink?t=State%20of%20Bandcamp%20Recommender%2c%20Late%202017&u=https%3a%2f%2fyanirseroussi.com%2f2017%2f09%2f02%2fstate-of-bandcamp-recommender%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="Bandcamp,BCRecommender"><meta name=description content="Call for BCRecommender maintainers followed by a decision to shut it down, as I don&rsquo;t have enough time and Bandcamp now offers recommendations."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="State of Bandcamp Recommender, Late 2017"><meta property="og:description" content="Call for BCRecommender maintainers followed by a decision to shut it down, as I don&rsquo;t have enough time and Bandcamp now offers recommendations."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/"><meta property="og:image" content="https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2017-09-02T10:19:02+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage.jpg"><meta name=twitter:title content="State of Bandcamp Recommender, Late 2017"><meta name=twitter:description content="Call for BCRecommender maintainers followed by a decision to shut it down, as I don&rsquo;t have enough time and Bandcamp now offers recommendations."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"State of Bandcamp Recommender, Late 2017","item":"https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"State of Bandcamp Recommender, Late 2017","name":"State of Bandcamp Recommender, Late 2017","description":"Call for BCRecommender maintainers followed by a decision to shut it down, as I don\u0026rsquo;t have enough time and Bandcamp now offers recommendations.","keywords":["Bandcamp","BCRecommender"],"articleBody":"November 2017: Update and goodbye I’ve decided to shut down Bandcamp Recommender (BCRecommender), despite hearing back from a few volunteers. The main reasons are:\nBandcamp now shows album recommendations at the bottom of album pages. While this isn’t quite the same as BCRecommender, I hope that it will evolve to a more comprehensive recommender system. I tried to contact Bandcamp to get their support for the continued running of BCRecommender. I have not heard back from them. It would have been nice to receive some acknowledgement that they find BCRecommender useful. As discussed below, I don’t have much time to spend on the project, and handing it off to other maintainers would have been time-consuming. Given reasons 1 and 2, I don’t feel like it’s worth the effort. Thanks to everyone who’s contacted me – you’re awesome! September 2017: Original announcement I released the first version of Bandcamp Recommender (BCRecommender) about three years ago, with the main goal of surfacing music recommendations from Bandcamp. A secondary goal was learning more about building and marketing a standalone web app. As such, I shared a few posts about BCRecommender over the years:\nInitial posts on the motivation behind building BCRecommender, original system layout, and the recommendation engine. Marketing-oriented posts on applying the Traction Book’s framework to BCRecommender, followed by an update on traction successes and failures, and another post on finding some SEO success. Later architectural changes, including moving away from Parse.com and migrating from MongoDB to Elasticsearch. The last of the above posts was published in November 2015 – almost two years ago. Most of the work on BCRecommender was done up to that point, when my main focus was on part-time contracting while working on my own projects. However, since January 2016 I’ve mostly been working full-time, so I haven’t had the time to give enough attention to the project. Therefore, it looks like it’s time for me to say goodbye to BCRecommender.\nDespite the lack of attention, about 5,000 people still visit BCRecommender every month (down from a peak of around 9,000). I know that people find it useful, even though it hasn’t been functionally updated in a long time (though the recommendations have been refreshed a few times). In an ideal world, BCRecommender would be replaced by algorithmic recommendations from Bandcamp. But unfortunately, Bandcamp still doesn’t offer personalised recommendations. This is a shame, because such recommendations could be of great benefit to both artists and fans. Millions of tracks and albums have been published on Bandcamp, meaning that serving personalised recommendations that cover their full catalogue can only be achieved using algorithms. However, it seems like they’re not interested in building this kind of functionality.\nRather than simply pulling the plug on BCRecommender, I thought I’d put a call out to see if anyone is interested in maintaining it. I’m happy to open source the code and hand the project over to someone else if it means it would be in good hands. With a little bit of work, BCRecommender can be turned into a full Bandcamp-based personalised radio station. If you think you’d be a good fit for maintaining the project, drop me a line and we can discuss further. If you just love BCRecommender, you can also let Bandcamp know that you want them to implement algorithmic recommendations (e.g., on Twitter or by emailing support@bandcamp.com). I’ll keep BCRecommender alive for about two more months and see if I get any responses. Either way, I’ll be saying goodbye to maintaining it before the end of the year.\n","wordCount":"590","inLanguage":"en","image":"https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage.jpg","datePublished":"2017-09-02T10:19:02Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">State of Bandcamp Recommender, Late 2017</h1><div class=post-meta><span title='2017-09-02 10:19:02 +0000 UTC'>September 2, 2017</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage_hu4bb4a83eb29302b814ecd8f57b6ac5b4_276563_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage_hu4bb4a83eb29302b814ecd8f57b6ac5b4_276563_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage_hu4bb4a83eb29302b814ecd8f57b6ac5b4_276563_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage_hu4bb4a83eb29302b814ecd8f57b6ac5b4_276563_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage_hu4bb4a83eb29302b814ecd8f57b6ac5b4_276563_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage.jpg 3738w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/bcrecommender-homepage.jpg alt width=3738 height=1127></figure><div class=post-content><h2 id=november-2017-update-and-goodbye>November 2017: Update and goodbye<a hidden class=anchor aria-hidden=true href=#november-2017-update-and-goodbye>#</a></h2><p>I&rsquo;ve decided to shut down Bandcamp Recommender (BCRecommender), despite hearing back from a few volunteers. The main reasons are:</p><ol><li>Bandcamp now shows album recommendations at the bottom of album pages. While this isn&rsquo;t quite the same as BCRecommender, I hope that it will evolve to a more comprehensive recommender system.</li><li>I tried to contact Bandcamp to get their support for the continued running of BCRecommender. I have not heard back from them. It would have been nice to receive some acknowledgement that they find BCRecommender useful.</li><li>As discussed below, I don&rsquo;t have much time to spend on the project, and handing it off to other maintainers would have been time-consuming. Given reasons 1 and 2, I don&rsquo;t feel like it&rsquo;s worth the effort. Thanks to everyone who&rsquo;s contacted me – you&rsquo;re awesome!</li></ol><h2 id=september-2017-original-announcement>September 2017: Original announcement<a hidden class=anchor aria-hidden=true href=#september-2017-original-announcement>#</a></h2><p>I <a href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/>released the first version of Bandcamp Recommender (BCRecommender) about three years ago</a>, with the main goal of surfacing music recommendations from <a href=https://bandcamp.com/ target=_blank rel=noopener>Bandcamp</a>. A secondary goal was learning more about building and marketing a standalone web app. As such, I shared a few posts about BCRecommender over the years:</p><ul><li>Initial posts on <a href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/>the motivation behind building BCRecommender</a>, <a href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/>original system layout</a>, and <a href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/>the recommendation engine</a>.</li><li>Marketing-oriented posts on <a href=https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/>applying the Traction Book&rsquo;s framework to BCRecommender</a>, followed by <a href=https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/>an update on traction successes and failures</a>, and another post on <a href=https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/>finding some SEO success</a>.</li><li>Later architectural changes, including <a href=https://yanirseroussi.com/2015/07/31/goodbye-parse-com/>moving away from Parse.com</a> and <a href=https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/>migrating from MongoDB to Elasticsearch</a>.</li></ul><p>The last of the above posts was published in November 2015 – almost two years ago. Most of the work on BCRecommender was done up to that point, when my main focus was on part-time contracting while working on my own projects. However, since January 2016 I&rsquo;ve mostly been working full-time, so I haven&rsquo;t had the time to give enough attention to the project. Therefore, it looks like it&rsquo;s time for me to say goodbye to BCRecommender.</p><p>Despite the lack of attention, about 5,000 people still visit BCRecommender every month (down from a peak of around 9,000). I know that people find it useful, even though it hasn&rsquo;t been functionally updated in a long time (though the recommendations have been refreshed a few times). In an ideal world, BCRecommender would be replaced by algorithmic recommendations from Bandcamp. But unfortunately, Bandcamp still doesn&rsquo;t offer personalised recommendations. This is a shame, because such recommendations could be of great benefit to both artists and fans. Millions of tracks and albums have been published on Bandcamp, meaning that serving personalised recommendations that cover their full catalogue can only be achieved using algorithms. However, it seems like they&rsquo;re not interested in building this kind of functionality.</p><p>Rather than simply pulling the plug on BCRecommender, I thought I&rsquo;d put a call out to see if anyone is interested in maintaining it. I&rsquo;m happy to open source the code and hand the project over to someone else if it means it would be in good hands. With a little bit of work, BCRecommender can be turned into a full Bandcamp-based personalised radio station. If you think you&rsquo;d be a good fit for maintaining the project, <a href=https://yanirseroussi.com/about/>drop me a line</a> and we can discuss further. If you just love BCRecommender, you can also let Bandcamp know that you want them to implement algorithmic recommendations (e.g., on <a href=https://twitter.com/bandcamp target=_blank rel=noopener>Twitter</a> or by emailing <a href=mailto:support@bandcamp.com>support@bandcamp.com</a>). I&rsquo;ll keep BCRecommender alive for about two more months and see if I get any responses. Either way, I&rsquo;ll be saying goodbye to maintaining it before the end of the year.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/bandcamp/>Bandcamp</a></li><li><a href=https://yanirseroussi.com/tags/bcrecommender/>BCRecommender</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share State of Bandcamp Recommender, Late 2017 on x" href="https://x.com/intent/tweet/?text=State%20of%20Bandcamp%20Recommender%2c%20Late%202017&amp;url=https%3a%2f%2fyanirseroussi.com%2f2017%2f09%2f02%2fstate-of-bandcamp-recommender%2f&amp;hashtags=Bandcamp%2cBCRecommender"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share State of Bandcamp Recommender, Late 2017 on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2017%2f09%2f02%2fstate-of-bandcamp-recommender%2f&amp;title=State%20of%20Bandcamp%20Recommender%2c%20Late%202017&amp;summary=State%20of%20Bandcamp%20Recommender%2c%20Late%202017&amp;source=https%3a%2f%2fyanirseroussi.com%2f2017%2f09%2f02%2fstate-of-bandcamp-recommender%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share State of Bandcamp Recommender, Late 2017 on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2017%2f09%2f02%2fstate-of-bandcamp-recommender%2f&title=State%20of%20Bandcamp%20Recommender%2c%20Late%202017"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share State of Bandcamp Recommender, Late 2017 on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2017%2f09%2f02%2fstate-of-bandcamp-recommender%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share State of Bandcamp Recommender, Late 2017 on whatsapp" href="https://api.whatsapp.com/send?text=State%20of%20Bandcamp%20Recommender%2c%20Late%202017%20-%20https%3a%2f%2fyanirseroussi.com%2f2017%2f09%2f02%2fstate-of-bandcamp-recommender%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share State of Bandcamp Recommender, Late 2017 on telegram" href="https://telegram.me/share/url?text=State%20of%20Bandcamp%20Recommender%2c%20Late%202017&amp;url=https%3a%2f%2fyanirseroussi.com%2f2017%2f09%2f02%2fstate-of-bandcamp-recommender%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share State of Bandcamp Recommender, Late 2017 on ycombinator" href="https://news.ycombinator.com/submitlink?t=State%20of%20Bandcamp%20Recommender%2c%20Late%202017&u=https%3a%2f%2fyanirseroussi.com%2f2017%2f09%2f02%2fstate-of-bandcamp-recommender%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/index.html b/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/index.html
index 6c4979ab4..6b7b048a5 100644
--- a/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/index.html
+++ b/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Advice for aspiring data scientists and other FAQs | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="career,data business,data science,frequently asked questions"><meta name=description content="Frequently asked questions by visitors to this site, especially around entering the data science field."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Advice for aspiring data scientists and other FAQs"><meta property="og:description" content="Frequently asked questions by visitors to this site, especially around entering the data science field."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/"><meta property="og:image" content="https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2017-10-15T09:15:25+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers.jpg"><meta name=twitter:title content="Advice for aspiring data scientists and other FAQs"><meta name=twitter:description content="Frequently asked questions by visitors to this site, especially around entering the data science field."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Advice for aspiring data scientists and other FAQs","item":"https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Advice for aspiring data scientists and other FAQs","name":"Advice for aspiring data scientists and other FAQs","description":"Frequently asked questions by visitors to this site, especially around entering the data science field.","keywords":["career","data business","data science","frequently asked questions"],"articleBody":"Aspiring data scientists and other visitors to this site often repeat the same questions. This post is the definitive collection of my answers to such questions (which may evolve over time).\nHow do I become a data scientist?\nIt depends on your situation. Before we get into it, have you thought about why you want to become a data scientist? Hmm… Not really. Why should I become a data scientist?\nI can't answer this for you, but it's great to see you asking why. Do you know what data science is? Do you understand what data scientists do? Sort of. Just so we’re on the same page, what is data science?\nNo one knows for sure. Here are my thoughts from 2014 on defining data science as the intersection of software engineering and statistics, and a more recent post on defining data science in 2018. What are the hardest parts of data science?\nThe hardest parts of data science are problem definition and solution measurement, not model fitting and data cleaning, because counting things is hard. Thanks, that’s helpful. But what do data scientists actually do?\nIt varies a lot. This variability makes the job title somewhat useless. You should try to get an idea what areas of data science interest you. For many people, excitement over the technical aspects wanes with time. And even if you still find the technical aspects exciting, most jobs have boring parts. When considering career changes, think of the non-technical aspects that would keep you engaged. To answer the question, here are some posts on things I've done: Joined Automattic by improving the Elasticsearch language detection plugin, calculated customer lifetime value, analysed A/B test results, built recommender systems (including one for Bandcamp music), competed on Kaggle, and completed a PhD. I've also dabbled in deep learning, marine surveys, causality, and other things that I haven't had the chance to write about. Cool! Can you provide a general overview of how to become a data scientist?\nYes! Check out Alec Smith's excellent articles. I’m pretty happy with my current job, but still thinking of becoming a data scientist. What should I do?\nFind ways of doing data science within your current role, working overtime if needed. Working on a real problem in a familiar domain is much more valuable than working on toy problems from online courses and platforms like Kaggle (though they're also useful). If you're a data analyst, learn how to program to automate and simplify your analyses. If you're a software engineer, become comfortable with analysing and modelling data. Machine learning doesn't have to be a part of what you choose to do. I’m pretty busy. What online course should I take to learn about the area?\nCalling Bullshit: Data Reasoning for the Digital Age is a good place to start. Deep learning should be pretty low on your list if you don't have much background in the area. Should I learn Python or R? Keras or Tensorflow? What about ?\nIt doesn't matter. Focus on principles and you'll be fine. The following quote still applies today (to people of all genders). As to methods, there may be a million and then some, but principles are few. The man who grasps principles can successfully select his own methods. The man who tries methods, ignoring principles, is sure to have trouble.\nHarrington Emerson (1911) I want to become a data science freelancer. Can you provide some advice?\nAs with any freelancing job, expect to spend much of your time on sales and networking. I've only explored the freelancing path briefly, but Radim Řehůřek has published great slides on the topic. If you're thinking of freelancing as a way of gaining financial independence, also consider spending less, earning more, and investing wisely. Can you recommend an academic data science degree?\nSorry, but I don't know much about those degrees. Boris Gorelik has some interesting thoughts on studying data science. Will you be my mentor?\nProbably not, unless you're hard-working, independent, and doing something I find interesting. Feel free to contact me if you believe we'd both find the relationship beneficial. Can you help with my project?\nPossibly. If you think I'd find your project exciting, please do contact me. What about ethics?\nWhat about them? There isn't a single definition of right and wrong, as morality is multi-dimensional. I believe it's important to question your own choices, and avoid applying data science blindly. For me, this means divesting from harmful industries like fossil fuels and striving to go beyond the creation of greedy robots (among other things). I’m a manager. When should I hire a data scientist and start using machine learning?\nThere's a good chance you don't need a data scientist yet, but you should be aware of common pitfalls when trying to be data-driven. It's also worth reading Paras Chopra's post on what you need to know before you board the machine learning train. Do you want to buy my products or services?\nNo. If I did, I'd contact you. I have a question that isn’t answered here or anywhere on the internet, and I think you can help. Can I contact you?\nSure, use the form on this page. ","wordCount":"870","inLanguage":"en","image":"https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers.jpg","datePublished":"2017-10-15T09:15:25Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Advice for aspiring data scientists and other FAQs</h1><div class=post-meta><span title='2017-10-15 09:15:25 +0000 UTC'>October 15, 2017</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers_hueb610a201bee2910ae39d7006395df9e_608324_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers_hueb610a201bee2910ae39d7006395df9e_608324_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers_hueb610a201bee2910ae39d7006395df9e_608324_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers_hueb610a201bee2910ae39d7006395df9e_608324_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers_hueb610a201bee2910ae39d7006395df9e_608324_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers.jpg 4000w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers.jpg alt width=4000 height=1620></figure><div class=post-content><p>Aspiring data scientists and other visitors to this site often repeat the same questions. This post is the definitive collection of my answers to such questions (which may evolve over time).</p><p><b id=how-do-i-become-a-data-scientist>How do I become a data scientist?</b></p><p class=indent-1>It depends on your situation. Before we get into it, have you thought about why you want to become a data scientist?</p><p><b id=why-should-i-become-a-data-scientist>Hmm&mldr; Not really. Why should I become a data scientist?</b></p><p class=indent-1>I can't answer this for you, but it's great to see you <a href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/>asking why</a>. Do you know what data science is? Do you understand what data scientists do?</p><p><b id=what-is-data-science>Sort of. Just so we&rsquo;re on the same page, what is data science?</b></p><p class=indent-1>No one knows for sure. Here are <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>my thoughts from 2014 on defining data science as the intersection of software engineering and statistics</a>, and a more recent post on <a href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/>defining data science in 2018</a>.</p><p><b id=hardest-parts-of-data-science>What are the hardest parts of data science?</b></p><p class=indent-1>The hardest parts of data science are <a href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/>problem definition and solution measurement, not model fitting and data cleaning</a>, because <a href=http://daynebatten.com/2016/06/counting-hard-data-science/>counting things is hard</a>.</p><p><b id=what-do-data-scientists-do>Thanks, that&rsquo;s helpful. But what do data scientists actually do?</b></p><p class=indent-1>It varies a lot. This variability makes the job title <a href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/>somewhat useless</a>. You should try to get an idea what areas of data science interest you. For many people, excitement over the technical aspects wanes with time. And even if you still find the technical aspects exciting, most jobs have boring parts. When considering career changes, think of the non-technical aspects that would keep you engaged.</p><p class=indent-1>To answer the question, here are some posts on things I've done: <a href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/>Joined Automattic by improving the Elasticsearch language detection plugin</a>, <a href=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/>calculated customer lifetime value</a>, <a href=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/>analysed A/B test results</a>, <a href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/>built recommender systems</a> (including <a href=https://yanirseroussi.com/state-of-bandcamp-recommender-september-2017/>one for Bandcamp music</a>), <a href=https://yanirseroussi.com/2014/04/05/kaggle-competition-summaries/>competed on Kaggle</a>, and <a href=https://yanirseroussi.wordpress.com/phd-work/>completed a PhD</a>. I've also dabbled in <a href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/>deep learning</a>, <a href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/>marine surveys</a>, <a href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/>causality</a>, and other things that I haven't had the chance to write about.</p><p><b id=become-a-data-scientist-overview>Cool! Can you provide a general overview of how to become a data scientist?</b></p><p class=indent-1>Yes! Check out <a href=https://www.experfy.com/blog/how-to-become-a-data-scientist-part-1-3>Alec Smith's excellent articles</a>.</p><p><b id=pivot-into-data-science>I&rsquo;m pretty happy with my current job, but still thinking of becoming a data scientist. What should I do?</b></p><p class=indent-1>Find ways of doing data science within your current role, working overtime if needed. Working on a real problem in a familiar domain is much more valuable than working on toy problems from online courses and platforms like Kaggle (though they're also useful). If you're a data analyst, learn how to program to automate and simplify your analyses. If you're a software engineer, become comfortable with analysing and modelling data. <a href=https://brohrer.github.io/imposter_syndrome.html>Machine learning doesn't have to be a part of what you choose to do</a>.</p><p><b id=online-course-recommendation>I&rsquo;m pretty busy. What online course should I take to learn about the area?</b></p><p class=indent-1><a href=http://callingbullshit.org/>Calling Bullshit: Data Reasoning for the Digital Age</a> is a good place to start. <a href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/>Deep learning should be pretty low on your list</a> if you don't have much background in the area.</p><p><b id=tool-recommendation>Should I learn Python or R? Keras or Tensorflow? What about <code>&lt;insert name here></code>?</b></p><p class=indent-1>It doesn't matter. Focus on principles and you'll be fine. The following quote still applies today (to people of all genders).</p><blockquote><p>As to methods, there may be a million and then some, but principles are few. The man who grasps principles can successfully select his own methods. The man who tries methods, ignoring principles, is sure to have trouble.</p><footer><strong></strong>
+<meta name=keywords content="career,data business,data science,frequently asked questions"><meta name=description content="Frequently asked questions by visitors to this site, especially around entering the data science field."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Advice for aspiring data scientists and other FAQs"><meta property="og:description" content="Frequently asked questions by visitors to this site, especially around entering the data science field."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/"><meta property="og:image" content="https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2017-10-15T09:15:25+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers.jpg"><meta name=twitter:title content="Advice for aspiring data scientists and other FAQs"><meta name=twitter:description content="Frequently asked questions by visitors to this site, especially around entering the data science field."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Advice for aspiring data scientists and other FAQs","item":"https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Advice for aspiring data scientists and other FAQs","name":"Advice for aspiring data scientists and other FAQs","description":"Frequently asked questions by visitors to this site, especially around entering the data science field.","keywords":["career","data business","data science","frequently asked questions"],"articleBody":"Aspiring data scientists and other visitors to this site often repeat the same questions. This post is the definitive collection of my answers to such questions (which may evolve over time).\nHow do I become a data scientist?\nIt depends on your situation. Before we get into it, have you thought about why you want to become a data scientist? Hmm… Not really. Why should I become a data scientist?\nI can't answer this for you, but it's great to see you asking why. Do you know what data science is? Do you understand what data scientists do? Sort of. Just so we’re on the same page, what is data science?\nNo one knows for sure. Here are my thoughts from 2014 on defining data science as the intersection of software engineering and statistics, and a more recent post on defining data science in 2018. What are the hardest parts of data science?\nThe hardest parts of data science are problem definition and solution measurement, not model fitting and data cleaning, because counting things is hard. Thanks, that’s helpful. But what do data scientists actually do?\nIt varies a lot. This variability makes the job title somewhat useless. You should try to get an idea what areas of data science interest you. For many people, excitement over the technical aspects wanes with time. And even if you still find the technical aspects exciting, most jobs have boring parts. When considering career changes, think of the non-technical aspects that would keep you engaged. To answer the question, here are some posts on things I've done: Joined Automattic by improving the Elasticsearch language detection plugin, calculated customer lifetime value, analysed A/B test results, built recommender systems (including one for Bandcamp music), competed on Kaggle, and completed a PhD. I've also dabbled in deep learning, marine surveys, causality, and other things that I haven't had the chance to write about. Cool! Can you provide a general overview of how to become a data scientist?\nYes! Check out Alec Smith's excellent articles. I’m pretty happy with my current job, but still thinking of becoming a data scientist. What should I do?\nFind ways of doing data science within your current role, working overtime if needed. Working on a real problem in a familiar domain is much more valuable than working on toy problems from online courses and platforms like Kaggle (though they're also useful). If you're a data analyst, learn how to program to automate and simplify your analyses. If you're a software engineer, become comfortable with analysing and modelling data. Machine learning doesn't have to be a part of what you choose to do. I’m pretty busy. What online course should I take to learn about the area?\nCalling Bullshit: Data Reasoning for the Digital Age is a good place to start. Deep learning should be pretty low on your list if you don't have much background in the area. Should I learn Python or R? Keras or Tensorflow? What about ?\nIt doesn't matter. Focus on principles and you'll be fine. The following quote still applies today (to people of all genders). As to methods, there may be a million and then some, but principles are few. The man who grasps principles can successfully select his own methods. The man who tries methods, ignoring principles, is sure to have trouble.\nHarrington Emerson (1911) I want to become a data science freelancer. Can you provide some advice?\nAs with any freelancing job, expect to spend much of your time on sales and networking. I've only explored the freelancing path briefly, but Radim Řehůřek has published great slides on the topic. If you're thinking of freelancing as a way of gaining financial independence, also consider spending less, earning more, and investing wisely. Can you recommend an academic data science degree?\nSorry, but I don't know much about those degrees. Boris Gorelik has some interesting thoughts on studying data science. Will you be my mentor?\nProbably not, unless you're hard-working, independent, and doing something I find interesting. Feel free to contact me if you believe we'd both find the relationship beneficial. Can you help with my project?\nPossibly. If you think I'd find your project exciting, please do contact me. What about ethics?\nWhat about them? There isn't a single definition of right and wrong, as morality is multi-dimensional. I believe it's important to question your own choices, and avoid applying data science blindly. For me, this means divesting from harmful industries like fossil fuels and striving to go beyond the creation of greedy robots (among other things). I’m a manager. When should I hire a data scientist and start using machine learning?\nThere's a good chance you don't need a data scientist yet, but you should be aware of common pitfalls when trying to be data-driven. It's also worth reading Paras Chopra's post on what you need to know before you board the machine learning train. Do you want to buy my products or services?\nNo. If I did, I'd contact you. I have a question that isn’t answered here or anywhere on the internet, and I think you can help. Can I contact you?\nSure, use the form on this page. ","wordCount":"870","inLanguage":"en","image":"https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers.jpg","datePublished":"2017-10-15T09:15:25Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Advice for aspiring data scientists and other FAQs</h1><div class=post-meta><span title='2017-10-15 09:15:25 +0000 UTC'>October 15, 2017</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers_hueb610a201bee2910ae39d7006395df9e_608324_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers_hueb610a201bee2910ae39d7006395df9e_608324_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers_hueb610a201bee2910ae39d7006395df9e_608324_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers_hueb610a201bee2910ae39d7006395df9e_608324_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers_hueb610a201bee2910ae39d7006395df9e_608324_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers.jpg 4000w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/gold-coast-surfers.jpg alt width=4000 height=1620></figure><div class=post-content><p>Aspiring data scientists and other visitors to this site often repeat the same questions. This post is the definitive collection of my answers to such questions (which may evolve over time).</p><p><b id=how-do-i-become-a-data-scientist>How do I become a data scientist?</b></p><p class=indent-1>It depends on your situation. Before we get into it, have you thought about why you want to become a data scientist?</p><p><b id=why-should-i-become-a-data-scientist>Hmm&mldr; Not really. Why should I become a data scientist?</b></p><p class=indent-1>I can't answer this for you, but it's great to see you <a href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/>asking why</a>. Do you know what data science is? Do you understand what data scientists do?</p><p><b id=what-is-data-science>Sort of. Just so we&rsquo;re on the same page, what is data science?</b></p><p class=indent-1>No one knows for sure. Here are <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>my thoughts from 2014 on defining data science as the intersection of software engineering and statistics</a>, and a more recent post on <a href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/>defining data science in 2018</a>.</p><p><b id=hardest-parts-of-data-science>What are the hardest parts of data science?</b></p><p class=indent-1>The hardest parts of data science are <a href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/>problem definition and solution measurement, not model fitting and data cleaning</a>, because <a href=http://daynebatten.com/2016/06/counting-hard-data-science/>counting things is hard</a>.</p><p><b id=what-do-data-scientists-do>Thanks, that&rsquo;s helpful. But what do data scientists actually do?</b></p><p class=indent-1>It varies a lot. This variability makes the job title <a href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/>somewhat useless</a>. You should try to get an idea what areas of data science interest you. For many people, excitement over the technical aspects wanes with time. And even if you still find the technical aspects exciting, most jobs have boring parts. When considering career changes, think of the non-technical aspects that would keep you engaged.</p><p class=indent-1>To answer the question, here are some posts on things I've done: <a href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/>Joined Automattic by improving the Elasticsearch language detection plugin</a>, <a href=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/>calculated customer lifetime value</a>, <a href=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/>analysed A/B test results</a>, <a href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/>built recommender systems</a> (including <a href=https://yanirseroussi.com/state-of-bandcamp-recommender-september-2017/>one for Bandcamp music</a>), <a href=https://yanirseroussi.com/2014/04/05/kaggle-competition-summaries/>competed on Kaggle</a>, and <a href=https://yanirseroussi.wordpress.com/phd-work/>completed a PhD</a>. I've also dabbled in <a href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/>deep learning</a>, <a href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/>marine surveys</a>, <a href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/>causality</a>, and other things that I haven't had the chance to write about.</p><p><b id=become-a-data-scientist-overview>Cool! Can you provide a general overview of how to become a data scientist?</b></p><p class=indent-1>Yes! Check out <a href=https://www.experfy.com/blog/how-to-become-a-data-scientist-part-1-3>Alec Smith's excellent articles</a>.</p><p><b id=pivot-into-data-science>I&rsquo;m pretty happy with my current job, but still thinking of becoming a data scientist. What should I do?</b></p><p class=indent-1>Find ways of doing data science within your current role, working overtime if needed. Working on a real problem in a familiar domain is much more valuable than working on toy problems from online courses and platforms like Kaggle (though they're also useful). If you're a data analyst, learn how to program to automate and simplify your analyses. If you're a software engineer, become comfortable with analysing and modelling data. <a href=https://brohrer.github.io/imposter_syndrome.html>Machine learning doesn't have to be a part of what you choose to do</a>.</p><p><b id=online-course-recommendation>I&rsquo;m pretty busy. What online course should I take to learn about the area?</b></p><p class=indent-1><a href=http://callingbullshit.org/>Calling Bullshit: Data Reasoning for the Digital Age</a> is a good place to start. <a href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/>Deep learning should be pretty low on your list</a> if you don't have much background in the area.</p><p><b id=tool-recommendation>Should I learn Python or R? Keras or Tensorflow? What about <code>&lt;insert name here></code>?</b></p><p class=indent-1>It doesn't matter. Focus on principles and you'll be fine. The following quote still applies today (to people of all genders).</p><blockquote><p>As to methods, there may be a million and then some, but principles are few. The man who grasps principles can successfully select his own methods. The man who tries methods, ignoring principles, is sure to have trouble.</p><footer><strong></strong>
 <cite><a href=https://quoteinvestigator.com/2015/07/17/methods/ title=https://quoteinvestigator.com/2015/07/17/methods/ target=_blank rel=noopener>Harrington Emerson (1911)</a></cite></footer></blockquote><p><b id=become-a-data-science-freelancer>I want to become a data science freelancer. Can you provide some advice?</b></p><p class=indent-1>As with any freelancing job, expect to spend much of your time on sales and networking. I've only <a href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/>explored the freelancing path briefly</a>, but <a href=https://berlinbuzzwords.de/sites/berlinbuzzwords.de/files/media/documents/radim_rehurek-so_you_want_to_be_a_data_science_consultant.pdf>Radim Řehůřek has published great slides on the topic</a>. If you're thinking of freelancing as a way of gaining financial independence, also consider <a href=https://minafi.com/interactive-guide-early-retirement-financial-independence/>spending less, earning more, and investing wisely</a>.</p><p><b id=data-science-degree>Can you recommend an academic data science degree?</b></p><p class=indent-1>Sorry, but I don't know much about those degrees. <a href=https://gorelik.net/2017/05/29/dont-study-data-science/>Boris Gorelik has some interesting thoughts on studying data science</a>.</p><p><b id=be-my-mentor>Will you be my mentor?</b></p><p class=indent-1>Probably not, unless you're hard-working, independent, and doing something I find interesting. Feel free to <a href=https://yanirseroussi.com/about/>contact me</a> if you believe we'd both find the relationship beneficial.</p><p><b id=help-with-my-project>Can you help with my project?</b></p><p class=indent-1>Possibly. If you think I'd find your project exciting, please do <a href=https://yanirseroussi.com/about/>contact me</a>.</p><hr><p><b id=ethics>What about ethics?</b></p><p class=indent-1>What about them? There isn't a single definition of right and wrong, as <a href=https://en.wikipedia.org/wiki/The_Righteous_Mind>morality is multi-dimensional</a>. I believe it's important to <a href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/>question your own choices</a>, and <a href=https://www.kdnuggets.com/2015/05/should-data-science-do-that.html>avoid applying data science blindly</a>. For me, this means <a href=https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/>divesting from harmful industries like fossil fuels</a> and striving to go beyond the creation of <a href=https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/>greedy robots</a> (among other things).</p><p><b id=data-driven-manager>I&rsquo;m a manager. When should I hire a data scientist and start using machine learning?</b></p><p class=indent-1>There's a good chance <a href=https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/>you don't need a data scientist yet</a>, but you should be aware of <a href=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/>common pitfalls when trying to be data-driven</a>. It's also worth reading Paras Chopra's post on <a href=https://growth.wingify.com/what-you-need-to-know-before-you-board-the-machine-learning-train-a81c513098fe>what you need to know before you board the machine learning train</a>.</p><p><b id=spam>Do you want to buy my products or services?</b></p><p class=indent-1>No. If I did, I'd contact you.</p><p><b id=other-questions>I have a question that isn&rsquo;t answered here or anywhere on the internet, and I think you can help. Can I contact you?</b></p><p class=indent-1>Sure, <a href=https://yanirseroussi.com/about/>use the form on this page</a>.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-business/>Data Business</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/frequently-asked-questions/>Frequently Asked Questions</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Advice for aspiring data scientists and other FAQs on x" href="https://x.com/intent/tweet/?text=Advice%20for%20aspiring%20data%20scientists%20and%20other%20FAQs&amp;url=https%3a%2f%2fyanirseroussi.com%2f2017%2f10%2f15%2fadvice-for-aspiring-data-scientists-and-other-faqs%2f&amp;hashtags=career%2cdatabusiness%2cdatascience%2cfrequentlyaskedquestions"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Advice for aspiring data scientists and other FAQs on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2017%2f10%2f15%2fadvice-for-aspiring-data-scientists-and-other-faqs%2f&amp;title=Advice%20for%20aspiring%20data%20scientists%20and%20other%20FAQs&amp;summary=Advice%20for%20aspiring%20data%20scientists%20and%20other%20FAQs&amp;source=https%3a%2f%2fyanirseroussi.com%2f2017%2f10%2f15%2fadvice-for-aspiring-data-scientists-and-other-faqs%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Advice for aspiring data scientists and other FAQs on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2017%2f10%2f15%2fadvice-for-aspiring-data-scientists-and-other-faqs%2f&title=Advice%20for%20aspiring%20data%20scientists%20and%20other%20FAQs"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Advice for aspiring data scientists and other FAQs on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2017%2f10%2f15%2fadvice-for-aspiring-data-scientists-and-other-faqs%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Advice for aspiring data scientists and other FAQs on whatsapp" href="https://api.whatsapp.com/send?text=Advice%20for%20aspiring%20data%20scientists%20and%20other%20FAQs%20-%20https%3a%2f%2fyanirseroussi.com%2f2017%2f10%2f15%2fadvice-for-aspiring-data-scientists-and-other-faqs%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Advice for aspiring data scientists and other FAQs on telegram" href="https://telegram.me/share/url?text=Advice%20for%20aspiring%20data%20scientists%20and%20other%20FAQs&amp;url=https%3a%2f%2fyanirseroussi.com%2f2017%2f10%2f15%2fadvice-for-aspiring-data-scientists-and-other-faqs%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Advice for aspiring data scientists and other FAQs on ycombinator" href="https://news.ycombinator.com/submitlink?t=Advice%20for%20aspiring%20data%20scientists%20and%20other%20FAQs&u=https%3a%2f%2fyanirseroussi.com%2f2017%2f10%2f15%2fadvice-for-aspiring-data-scientists-and-other-faqs%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
diff --git a/2018/07/22/defining-data-science-in-2018/index.html b/2018/07/22/defining-data-science-in-2018/index.html
index fcb637a39..582a1c7e3 100644
--- a/2018/07/22/defining-data-science-in-2018/index.html
+++ b/2018/07/22/defining-data-science-in-2018/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Defining data science in 2018 | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="analytics,artificial intelligence,business,data science,machine learning,statistics"><meta name=description content="Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Defining data science in 2018"><meta property="og:description" content="Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/"><meta property="og:image" content="https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2018-07-22T08:27:43+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here.jpg"><meta name=twitter:title content="Defining data science in 2018"><meta name=twitter:description content="Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Defining data science in 2018","item":"https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Defining data science in 2018","name":"Defining data science in 2018","description":"Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.","keywords":["analytics","artificial intelligence","business","data science","machine learning","statistics"],"articleBody":"I got my first data science job in 2012, the year Harvard Business Review announced data scientist to be the sexiest job of the 21st century. Two years later, I published a post on my then-favourite definition of data science, as the intersection between software engineering and statistics. Unfortunately, that definition became somewhat irrelevant as more and more people jumped on the data science bandwagon – possibly to the point of making data scientist useless as a job title. However, I still call myself a data scientist. Even better – I still get paid for being a data scientist. But what does it mean? What do I actually do here? This article is a short summary of my understanding of the definition of data science in 2018.\nIt’s not all about machine learning As I was wrapping up my PhD in 2012, I started thinking about my next steps. I knew I wanted to get back to working in the tech industry, ideally with a small startup. But it wasn’t clear to me how to market myself – my LinkedIn title at the time was “software engineer with a research background”, which is a bit of a mouthful. Around that time I heard about Kaggle and decided to try competing. This went pretty well, and exposed me to the data science community globally and in Melbourne, where I was living at the time. That’s how I first met Adam Neumann, the founder of Giveable, a startup that aimed to recommend gifts based on social networking data. Upon graduating, I joined Giveable as a data scientist. Changing my LinkedIn title quickly led to many other offers, but I was happy to be working on Giveable – I felt fortunate to have found a startup job that was related to my PhD research on recommender systems.\nMy understanding of data science at the time was heavily influenced by Kaggle and the tech industry. Kaggle was only about predictive modelling competitions back then, and so I believed that data science is about using machine learning to build models and deploy them as part of various applications. I was very comfortable with that definition, having spent my PhD years on several predictive modelling tasks, and having worked as a software engineer prior to that.\nThings have changed considerably since 2012. It is now much easier to deploy machine learning models, even without a deep understanding of how they work. Many more people call themselves data scientists, including some who are more focused on data analysis than on building data products. Even Kaggle – which is now owned by Google – has broadened its scope beyond modelling competitions to support other types of analysis. Numerous articles have been published on the meaning of data science in the past six years. We seem to be going towards a broad definition of the field, which includes any type of general data analysis. This trend of broadening the definition may make data scientist somewhat useless as a job title. However, I believe that data science tasks remain useful, as shown by the following definitions.\nRecent definitions by Hernán, Hawkins, and Dubossarsky In a recent article, Hernán et al. classify data science tasks into three types: description, prediction, and causal inference. Like other authors, they argue that causal inference has been neglected by traditional statistics and some scientific disciplines. They claim that the emergence of data science is an opportunity to get causal inference “right”. Further, they emphasise the importance of domain expert knowledge, which is essential in causal inference. Defining data science in this broad manner seems to capture the essence of what the field is about these days. However, purely descriptive tasks are still often performed by data analysts rather than scientists. And the distinction between prediction and causal inference can be a bit fuzzy, especially as the tools for the latter are at a lower level of maturity. In addition, while I agree with Hernán et al. that domain expertise is important, it seems unlikely that this will forever be the case. No one is born an expert – expertise is gained by learning from and interacting with the world. Therefore, it’s plausible that gaining expertise can and will be automated. Further, there are numerous cases where experts were proven to be wrong. For example, it wasn’t so long ago that doctors recommended smoking.\nDespite the importance of domain knowledge, one can argue that scientists that specialise in a single domain are not data scientists. In fact, the ability to go beyond one domain and think of data in a more abstract manner is what makes a data scientist. Applying this abstract knowledge often requires some domain expertise or input from domain experts, but most data science techniques are not domain-specific – they can be applied to many different problems. John Hawkins explains this point well in an article titled why all scientists are not data scientists:\nThose scientists and statisticians who have focused themselves on understanding the limitations and possibilities of making inferences from experimental data are the ones who are the forerunners to data scientists. They have a skill which transcends the particulars of what it takes to do lab work on cell cultures, or field studies for ecology etc. Their core skill involves thinking about the data involved at an abstracted level. To ask the question “given data with these properties, what conclusions can we draw?”\nFinally, according to Eugene Dubossarsky, “there’s only one purpose to data science, and that is to support decisions. And more specifically, to make better decisions. That should be something no one can argue with.” This goal-focused definition is unsurprising, given the fact that Eugene runs a training and consulting business and has been working in the field for over 20 years. I’m not going to argue with him, but to put it all together, we can define data science as a field that deals with description, prediction, and causal inference from data in a manner that is both domain-independent and domain-aware, with the ultimate goal of supporting decisions.\nWhat about AI? Everyone loves a good buzzword, and these days AI (Artificial Intelligence) is one of the hottest buzzwords. However, despite what some people may try to tell you, AI is unlikely to make data science obsolete any time soon. Following the above definition, as long as there is a need to make decisions based on data, there will be a need for data scientists. This includes decisions that aren’t made by humans, as data scientists are involved in building systems that make decisions autonomously.\nThe resurgence of AI feels somewhat amusing given my personal experience. One of the reasons I decided to pursue a PhD in natural language processing and personalisation was my interest in what I considered to be AI back in 2008. My initial introduction to the field was through an AI course and a project I did as part of my bachelor’s degree in computer science. However, by the time I graduated from my PhD, saying that I’m an AI expert seemed less useful than calling myself a data scientist. It may be that the field is about to shift again, and that rebranding as an AI expert would be more beneficial (though I’d be doing exactly the same work). Titles are somewhat silly – I’m going to continue working with data to support decisions for as long as there is demand for this kind of work and I continue enjoying it. There is plenty to learn and develop in this area, regardless of buzzwords and sexy titles.\n","wordCount":"1264","inLanguage":"en","image":"https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here.jpg","datePublished":"2018-07-22T08:27:43Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Defining data science in 2018</h1><div class=post-meta><span title='2018-07-22 08:27:43 +0000 UTC'>July 22, 2018</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here_hu2e849c7220f0ea4a04e1f6ecb54005af_335268_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here_hu2e849c7220f0ea4a04e1f6ecb54005af_335268_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here_hu2e849c7220f0ea4a04e1f6ecb54005af_335268_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here_hu2e849c7220f0ea4a04e1f6ecb54005af_335268_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here.jpg 1278w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here.jpg alt width=1278 height=686></figure><div class=post-content><p>I got my first data science job in 2012, the year <a href=https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century target=_blank rel=noopener>Harvard Business Review announced data scientist to be the sexiest job of the 21st century</a>. Two years later, I published <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>a post on my then-favourite definition of data science</a>, as the intersection between software engineering and statistics. Unfortunately, that definition became somewhat irrelevant as more and more people jumped on the data science bandwagon – possibly to the point of <a href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/>making data scientist useless as a job title</a>. However, I still call myself a data scientist. Even better – <a href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/>I still get paid for being a data scientist</a>. But what does it mean? What do I actually do here? This article is a short summary of my understanding of the definition of data science in 2018.</p><h2 id=its-not-all-about-machine-learning>It&rsquo;s not all about machine learning<a hidden class=anchor aria-hidden=true href=#its-not-all-about-machine-learning>#</a></h2><p>As I was wrapping up my PhD in 2012, I started thinking about my next steps. I knew I wanted to get back to working in the tech industry, ideally with a small startup. But it wasn&rsquo;t clear to me how to market myself – my LinkedIn title at the time was <em>&ldquo;software engineer with a research background&rdquo;</em>, which is a bit of a mouthful. Around that time I heard about <a href=https://www.kaggle.com/ target=_blank rel=noopener>Kaggle</a> and decided to try competing. <a href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/>This went pretty well</a>, and exposed me to the data science community globally and in Melbourne, where I was living at the time. That&rsquo;s how I first met Adam Neumann, the founder of Giveable, a startup that aimed to recommend gifts based on social networking data. Upon graduating, I joined Giveable as a data scientist. Changing my LinkedIn title quickly led to many other offers, but I was happy to be working on Giveable – I felt fortunate to have found a startup job that was related to my PhD research on recommender systems.</p><p>My understanding of data science at the time was heavily influenced by Kaggle and the tech industry. Kaggle was only about predictive modelling competitions back then, and so I believed that data science is about using machine learning to build models and deploy them as part of various applications. I was very comfortable with that definition, having spent my PhD years on several predictive modelling tasks, and having worked as a software engineer prior to that.</p><p>Things have changed considerably since 2012. It is now much easier to deploy machine learning models, <a href="https://www.youtube.com/watch?v=YOIo09qjVl4" target=_blank rel=noopener>even without a deep understanding of how they work</a>. Many more people call themselves data scientists, <a href=https://eng.lyft.com/whats-in-a-name-ce42f419d16c target=_blank rel=noopener>including some who are more focused on data analysis than on building data products</a>. Even Kaggle – which is now owned by Google – <a href="https://www.youtube.com/watch?v=AoRSIdLpFqU" target=_blank rel=noopener>has broadened its scope beyond modelling competitions to support other types of analysis</a>. Numerous articles have been published on the meaning of data science in the past six years. We seem to be going towards a broad definition of the field, which includes any type of general data analysis. This trend of broadening the definition <a href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/>may make data scientist somewhat useless as a job title</a>. However, I believe that data science tasks remain useful, as shown by the following definitions.</p><h2 id=recent-definitions-by-hernán-hawkins-and-dubossarsky>Recent definitions by Hernán, Hawkins, and Dubossarsky<a hidden class=anchor aria-hidden=true href=#recent-definitions-by-hernán-hawkins-and-dubossarsky>#</a></h2><p>In a <a href=https://arxiv.org/pdf/1804.10846.pdf target=_blank rel=noopener>recent article</a>, Hernán et al. classify data science tasks into three types: <em>description</em>, <em>prediction</em>, and <em>causal inference</em>. Like other authors, they argue that causal inference has been neglected by traditional statistics and some scientific disciplines. They claim that the emergence of data science is an opportunity to get causal inference &ldquo;right&rdquo;. Further, they emphasise the importance of domain expert knowledge, <a href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/>which is essential in causal inference</a>. Defining data science in this broad manner seems to capture the essence of what the field is about these days. However, purely descriptive tasks are still often performed by data <em>analysts</em> rather than <em>scientists</em>. And the distinction between prediction and causal inference can be a bit fuzzy, especially as the tools for the latter are at a lower level of maturity. In addition, while I agree with Hernán et al. that domain expertise is important, it seems unlikely that this will forever be the case. No one is born an expert – expertise is gained by learning from and interacting with the world. Therefore, it&rsquo;s plausible that gaining expertise can and will be automated. Further, there are numerous cases where experts were proven to be wrong. For example, it wasn&rsquo;t so long ago that <a href=https://www.healio.com/hematology-oncology/news/print/hemonc-today/%7B241d62a7-fe6e-4c5b-9fed-a33cc6e4bd7c%7D/cigarettes-were-once-physician-tested-approved target=_blank rel=noopener>doctors recommended smoking</a>.</p><p>Despite the importance of domain knowledge, one can argue that scientists that specialise in a single domain are not data scientists. In fact, the ability to go beyond one domain and think of data in a more abstract manner is what makes a data scientist. Applying this abstract knowledge often requires some domain expertise or input from domain experts, but most data science techniques are not domain-specific – they can be applied to many different problems. John Hawkins explains this point well in an article titled <em><a href=https://www.linkedin.com/pulse/why-all-scientists-data-john-hawkins target=_blank rel=noopener>why all scientists are not data scientists</a></em>:</p><blockquote><p>Those scientists and statisticians who have focused themselves on understanding the limitations and possibilities of making inferences from experimental data are the ones who are the forerunners to data scientists. They have a skill which transcends the particulars of what it takes to do lab work on cell cultures, or field studies for ecology etc. Their core skill involves thinking about the data involved at an abstracted level. To ask the question &ldquo;given data with these properties, what conclusions can we draw?&rdquo;</p></blockquote><p>Finally, <a href=https://www.superdatascience.com/podcast-one-purpose-data-science-truth-analytics/ target=_blank rel=noopener>according to Eugene Dubossarsky</a>, <em>&ldquo;there&rsquo;s only one purpose to data science, and that is to support decisions. And more specifically, to make better decisions. That should be something no one can argue with.&rdquo;</em> This goal-focused definition is unsurprising, given the fact that Eugene runs a training and consulting business and has been working in the field for over 20 years. I&rsquo;m not going to argue with him, but to put it all together, <strong>we can define data science as a field that deals with description, prediction, and causal inference from data in a manner that is both domain-independent and domain-aware, with the ultimate goal of supporting decisions</strong>.</p><h2 id=what-about-ai>What about AI?<a hidden class=anchor aria-hidden=true href=#what-about-ai>#</a></h2><p>Everyone loves a good buzzword, and these days AI (Artificial Intelligence) is one of the hottest buzzwords. However, despite <a href=https://www.forbes.com/sites/valleyvoices/2017/01/31/the-rise-of-ai-will-force-a-new-breed-of-data-scientist/ target=_blank rel=noopener>what some people may try to tell you</a>, AI is unlikely to make data science obsolete any time soon. Following the above definition, as long as there is a need to make decisions based on data, there will be a need for data scientists. This includes decisions that aren&rsquo;t made by humans, as data scientists are involved in building systems that make decisions autonomously.</p><p>The resurgence of AI feels somewhat amusing given my personal experience. One of the reasons I decided to pursue a PhD in natural language processing and personalisation was my interest in what I considered to be AI back in 2008. My initial introduction to the field was through an AI course and a project I did as part of my bachelor&rsquo;s degree in computer science. However, by the time I graduated from my PhD, saying that I&rsquo;m an AI expert seemed less useful than calling myself a data scientist. It may be that the field is about to shift again, and that rebranding as an AI expert would be more beneficial (though I&rsquo;d be doing exactly the same work). Titles are somewhat silly – I&rsquo;m going to continue working with data to support decisions for as long as there is demand for this kind of work and I continue enjoying it. There is plenty to learn and develop in this area, regardless of buzzwords and sexy titles.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/analytics/>Analytics</a></li><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/statistics/>Statistics</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Defining data science in 2018 on x" href="https://x.com/intent/tweet/?text=Defining%20data%20science%20in%202018&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f07%2f22%2fdefining-data-science-in-2018%2f&amp;hashtags=analytics%2cartificialintelligence%2cbusiness%2cdatascience%2cmachinelearning%2cstatistics"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Defining data science in 2018 on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f07%2f22%2fdefining-data-science-in-2018%2f&amp;title=Defining%20data%20science%20in%202018&amp;summary=Defining%20data%20science%20in%202018&amp;source=https%3a%2f%2fyanirseroussi.com%2f2018%2f07%2f22%2fdefining-data-science-in-2018%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Defining data science in 2018 on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2018%2f07%2f22%2fdefining-data-science-in-2018%2f&title=Defining%20data%20science%20in%202018"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Defining data science in 2018 on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2018%2f07%2f22%2fdefining-data-science-in-2018%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Defining data science in 2018 on whatsapp" href="https://api.whatsapp.com/send?text=Defining%20data%20science%20in%202018%20-%20https%3a%2f%2fyanirseroussi.com%2f2018%2f07%2f22%2fdefining-data-science-in-2018%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Defining data science in 2018 on telegram" href="https://telegram.me/share/url?text=Defining%20data%20science%20in%202018&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f07%2f22%2fdefining-data-science-in-2018%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Defining data science in 2018 on ycombinator" href="https://news.ycombinator.com/submitlink?t=Defining%20data%20science%20in%202018&u=https%3a%2f%2fyanirseroussi.com%2f2018%2f07%2f22%2fdefining-data-science-in-2018%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="analytics,artificial intelligence,business,data science,machine learning,statistics"><meta name=description content="Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Defining data science in 2018"><meta property="og:description" content="Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/"><meta property="og:image" content="https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2018-07-22T08:27:43+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here.jpg"><meta name=twitter:title content="Defining data science in 2018"><meta name=twitter:description content="Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Defining data science in 2018","item":"https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Defining data science in 2018","name":"Defining data science in 2018","description":"Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.","keywords":["analytics","artificial intelligence","business","data science","machine learning","statistics"],"articleBody":"I got my first data science job in 2012, the year Harvard Business Review announced data scientist to be the sexiest job of the 21st century. Two years later, I published a post on my then-favourite definition of data science, as the intersection between software engineering and statistics. Unfortunately, that definition became somewhat irrelevant as more and more people jumped on the data science bandwagon – possibly to the point of making data scientist useless as a job title. However, I still call myself a data scientist. Even better – I still get paid for being a data scientist. But what does it mean? What do I actually do here? This article is a short summary of my understanding of the definition of data science in 2018.\nIt’s not all about machine learning As I was wrapping up my PhD in 2012, I started thinking about my next steps. I knew I wanted to get back to working in the tech industry, ideally with a small startup. But it wasn’t clear to me how to market myself – my LinkedIn title at the time was “software engineer with a research background”, which is a bit of a mouthful. Around that time I heard about Kaggle and decided to try competing. This went pretty well, and exposed me to the data science community globally and in Melbourne, where I was living at the time. That’s how I first met Adam Neumann, the founder of Giveable, a startup that aimed to recommend gifts based on social networking data. Upon graduating, I joined Giveable as a data scientist. Changing my LinkedIn title quickly led to many other offers, but I was happy to be working on Giveable – I felt fortunate to have found a startup job that was related to my PhD research on recommender systems.\nMy understanding of data science at the time was heavily influenced by Kaggle and the tech industry. Kaggle was only about predictive modelling competitions back then, and so I believed that data science is about using machine learning to build models and deploy them as part of various applications. I was very comfortable with that definition, having spent my PhD years on several predictive modelling tasks, and having worked as a software engineer prior to that.\nThings have changed considerably since 2012. It is now much easier to deploy machine learning models, even without a deep understanding of how they work. Many more people call themselves data scientists, including some who are more focused on data analysis than on building data products. Even Kaggle – which is now owned by Google – has broadened its scope beyond modelling competitions to support other types of analysis. Numerous articles have been published on the meaning of data science in the past six years. We seem to be going towards a broad definition of the field, which includes any type of general data analysis. This trend of broadening the definition may make data scientist somewhat useless as a job title. However, I believe that data science tasks remain useful, as shown by the following definitions.\nRecent definitions by Hernán, Hawkins, and Dubossarsky In a recent article, Hernán et al. classify data science tasks into three types: description, prediction, and causal inference. Like other authors, they argue that causal inference has been neglected by traditional statistics and some scientific disciplines. They claim that the emergence of data science is an opportunity to get causal inference “right”. Further, they emphasise the importance of domain expert knowledge, which is essential in causal inference. Defining data science in this broad manner seems to capture the essence of what the field is about these days. However, purely descriptive tasks are still often performed by data analysts rather than scientists. And the distinction between prediction and causal inference can be a bit fuzzy, especially as the tools for the latter are at a lower level of maturity. In addition, while I agree with Hernán et al. that domain expertise is important, it seems unlikely that this will forever be the case. No one is born an expert – expertise is gained by learning from and interacting with the world. Therefore, it’s plausible that gaining expertise can and will be automated. Further, there are numerous cases where experts were proven to be wrong. For example, it wasn’t so long ago that doctors recommended smoking.\nDespite the importance of domain knowledge, one can argue that scientists that specialise in a single domain are not data scientists. In fact, the ability to go beyond one domain and think of data in a more abstract manner is what makes a data scientist. Applying this abstract knowledge often requires some domain expertise or input from domain experts, but most data science techniques are not domain-specific – they can be applied to many different problems. John Hawkins explains this point well in an article titled why all scientists are not data scientists:\nThose scientists and statisticians who have focused themselves on understanding the limitations and possibilities of making inferences from experimental data are the ones who are the forerunners to data scientists. They have a skill which transcends the particulars of what it takes to do lab work on cell cultures, or field studies for ecology etc. Their core skill involves thinking about the data involved at an abstracted level. To ask the question “given data with these properties, what conclusions can we draw?”\nFinally, according to Eugene Dubossarsky, “there’s only one purpose to data science, and that is to support decisions. And more specifically, to make better decisions. That should be something no one can argue with.” This goal-focused definition is unsurprising, given the fact that Eugene runs a training and consulting business and has been working in the field for over 20 years. I’m not going to argue with him, but to put it all together, we can define data science as a field that deals with description, prediction, and causal inference from data in a manner that is both domain-independent and domain-aware, with the ultimate goal of supporting decisions.\nWhat about AI? Everyone loves a good buzzword, and these days AI (Artificial Intelligence) is one of the hottest buzzwords. However, despite what some people may try to tell you, AI is unlikely to make data science obsolete any time soon. Following the above definition, as long as there is a need to make decisions based on data, there will be a need for data scientists. This includes decisions that aren’t made by humans, as data scientists are involved in building systems that make decisions autonomously.\nThe resurgence of AI feels somewhat amusing given my personal experience. One of the reasons I decided to pursue a PhD in natural language processing and personalisation was my interest in what I considered to be AI back in 2008. My initial introduction to the field was through an AI course and a project I did as part of my bachelor’s degree in computer science. However, by the time I graduated from my PhD, saying that I’m an AI expert seemed less useful than calling myself a data scientist. It may be that the field is about to shift again, and that rebranding as an AI expert would be more beneficial (though I’d be doing exactly the same work). Titles are somewhat silly – I’m going to continue working with data to support decisions for as long as there is demand for this kind of work and I continue enjoying it. There is plenty to learn and develop in this area, regardless of buzzwords and sexy titles.\n","wordCount":"1264","inLanguage":"en","image":"https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here.jpg","datePublished":"2018-07-22T08:27:43Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Defining data science in 2018</h1><div class=post-meta><span title='2018-07-22 08:27:43 +0000 UTC'>July 22, 2018</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here_hu2e849c7220f0ea4a04e1f6ecb54005af_335268_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here_hu2e849c7220f0ea4a04e1f6ecb54005af_335268_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here_hu2e849c7220f0ea4a04e1f6ecb54005af_335268_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here_hu2e849c7220f0ea4a04e1f6ecb54005af_335268_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here.jpg 1278w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/what-would-you-say-you-do-here.jpg alt width=1278 height=686></figure><div class=post-content><p>I got my first data science job in 2012, the year <a href=https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century target=_blank rel=noopener>Harvard Business Review announced data scientist to be the sexiest job of the 21st century</a>. Two years later, I published <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>a post on my then-favourite definition of data science</a>, as the intersection between software engineering and statistics. Unfortunately, that definition became somewhat irrelevant as more and more people jumped on the data science bandwagon – possibly to the point of <a href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/>making data scientist useless as a job title</a>. However, I still call myself a data scientist. Even better – <a href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/>I still get paid for being a data scientist</a>. But what does it mean? What do I actually do here? This article is a short summary of my understanding of the definition of data science in 2018.</p><h2 id=its-not-all-about-machine-learning>It&rsquo;s not all about machine learning<a hidden class=anchor aria-hidden=true href=#its-not-all-about-machine-learning>#</a></h2><p>As I was wrapping up my PhD in 2012, I started thinking about my next steps. I knew I wanted to get back to working in the tech industry, ideally with a small startup. But it wasn&rsquo;t clear to me how to market myself – my LinkedIn title at the time was <em>&ldquo;software engineer with a research background&rdquo;</em>, which is a bit of a mouthful. Around that time I heard about <a href=https://www.kaggle.com/ target=_blank rel=noopener>Kaggle</a> and decided to try competing. <a href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/>This went pretty well</a>, and exposed me to the data science community globally and in Melbourne, where I was living at the time. That&rsquo;s how I first met Adam Neumann, the founder of Giveable, a startup that aimed to recommend gifts based on social networking data. Upon graduating, I joined Giveable as a data scientist. Changing my LinkedIn title quickly led to many other offers, but I was happy to be working on Giveable – I felt fortunate to have found a startup job that was related to my PhD research on recommender systems.</p><p>My understanding of data science at the time was heavily influenced by Kaggle and the tech industry. Kaggle was only about predictive modelling competitions back then, and so I believed that data science is about using machine learning to build models and deploy them as part of various applications. I was very comfortable with that definition, having spent my PhD years on several predictive modelling tasks, and having worked as a software engineer prior to that.</p><p>Things have changed considerably since 2012. It is now much easier to deploy machine learning models, <a href="https://www.youtube.com/watch?v=YOIo09qjVl4" target=_blank rel=noopener>even without a deep understanding of how they work</a>. Many more people call themselves data scientists, <a href=https://eng.lyft.com/whats-in-a-name-ce42f419d16c target=_blank rel=noopener>including some who are more focused on data analysis than on building data products</a>. Even Kaggle – which is now owned by Google – <a href="https://www.youtube.com/watch?v=AoRSIdLpFqU" target=_blank rel=noopener>has broadened its scope beyond modelling competitions to support other types of analysis</a>. Numerous articles have been published on the meaning of data science in the past six years. We seem to be going towards a broad definition of the field, which includes any type of general data analysis. This trend of broadening the definition <a href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/>may make data scientist somewhat useless as a job title</a>. However, I believe that data science tasks remain useful, as shown by the following definitions.</p><h2 id=recent-definitions-by-hernán-hawkins-and-dubossarsky>Recent definitions by Hernán, Hawkins, and Dubossarsky<a hidden class=anchor aria-hidden=true href=#recent-definitions-by-hernán-hawkins-and-dubossarsky>#</a></h2><p>In a <a href=https://arxiv.org/pdf/1804.10846.pdf target=_blank rel=noopener>recent article</a>, Hernán et al. classify data science tasks into three types: <em>description</em>, <em>prediction</em>, and <em>causal inference</em>. Like other authors, they argue that causal inference has been neglected by traditional statistics and some scientific disciplines. They claim that the emergence of data science is an opportunity to get causal inference &ldquo;right&rdquo;. Further, they emphasise the importance of domain expert knowledge, <a href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/>which is essential in causal inference</a>. Defining data science in this broad manner seems to capture the essence of what the field is about these days. However, purely descriptive tasks are still often performed by data <em>analysts</em> rather than <em>scientists</em>. And the distinction between prediction and causal inference can be a bit fuzzy, especially as the tools for the latter are at a lower level of maturity. In addition, while I agree with Hernán et al. that domain expertise is important, it seems unlikely that this will forever be the case. No one is born an expert – expertise is gained by learning from and interacting with the world. Therefore, it&rsquo;s plausible that gaining expertise can and will be automated. Further, there are numerous cases where experts were proven to be wrong. For example, it wasn&rsquo;t so long ago that <a href=https://www.healio.com/hematology-oncology/news/print/hemonc-today/%7B241d62a7-fe6e-4c5b-9fed-a33cc6e4bd7c%7D/cigarettes-were-once-physician-tested-approved target=_blank rel=noopener>doctors recommended smoking</a>.</p><p>Despite the importance of domain knowledge, one can argue that scientists that specialise in a single domain are not data scientists. In fact, the ability to go beyond one domain and think of data in a more abstract manner is what makes a data scientist. Applying this abstract knowledge often requires some domain expertise or input from domain experts, but most data science techniques are not domain-specific – they can be applied to many different problems. John Hawkins explains this point well in an article titled <em><a href=https://www.linkedin.com/pulse/why-all-scientists-data-john-hawkins target=_blank rel=noopener>why all scientists are not data scientists</a></em>:</p><blockquote><p>Those scientists and statisticians who have focused themselves on understanding the limitations and possibilities of making inferences from experimental data are the ones who are the forerunners to data scientists. They have a skill which transcends the particulars of what it takes to do lab work on cell cultures, or field studies for ecology etc. Their core skill involves thinking about the data involved at an abstracted level. To ask the question &ldquo;given data with these properties, what conclusions can we draw?&rdquo;</p></blockquote><p>Finally, <a href=https://www.superdatascience.com/podcast-one-purpose-data-science-truth-analytics/ target=_blank rel=noopener>according to Eugene Dubossarsky</a>, <em>&ldquo;there&rsquo;s only one purpose to data science, and that is to support decisions. And more specifically, to make better decisions. That should be something no one can argue with.&rdquo;</em> This goal-focused definition is unsurprising, given the fact that Eugene runs a training and consulting business and has been working in the field for over 20 years. I&rsquo;m not going to argue with him, but to put it all together, <strong>we can define data science as a field that deals with description, prediction, and causal inference from data in a manner that is both domain-independent and domain-aware, with the ultimate goal of supporting decisions</strong>.</p><h2 id=what-about-ai>What about AI?<a hidden class=anchor aria-hidden=true href=#what-about-ai>#</a></h2><p>Everyone loves a good buzzword, and these days AI (Artificial Intelligence) is one of the hottest buzzwords. However, despite <a href=https://www.forbes.com/sites/valleyvoices/2017/01/31/the-rise-of-ai-will-force-a-new-breed-of-data-scientist/ target=_blank rel=noopener>what some people may try to tell you</a>, AI is unlikely to make data science obsolete any time soon. Following the above definition, as long as there is a need to make decisions based on data, there will be a need for data scientists. This includes decisions that aren&rsquo;t made by humans, as data scientists are involved in building systems that make decisions autonomously.</p><p>The resurgence of AI feels somewhat amusing given my personal experience. One of the reasons I decided to pursue a PhD in natural language processing and personalisation was my interest in what I considered to be AI back in 2008. My initial introduction to the field was through an AI course and a project I did as part of my bachelor&rsquo;s degree in computer science. However, by the time I graduated from my PhD, saying that I&rsquo;m an AI expert seemed less useful than calling myself a data scientist. It may be that the field is about to shift again, and that rebranding as an AI expert would be more beneficial (though I&rsquo;d be doing exactly the same work). Titles are somewhat silly – I&rsquo;m going to continue working with data to support decisions for as long as there is demand for this kind of work and I continue enjoying it. There is plenty to learn and develop in this area, regardless of buzzwords and sexy titles.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/analytics/>Analytics</a></li><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/statistics/>Statistics</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Defining data science in 2018 on x" href="https://x.com/intent/tweet/?text=Defining%20data%20science%20in%202018&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f07%2f22%2fdefining-data-science-in-2018%2f&amp;hashtags=analytics%2cartificialintelligence%2cbusiness%2cdatascience%2cmachinelearning%2cstatistics"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Defining data science in 2018 on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f07%2f22%2fdefining-data-science-in-2018%2f&amp;title=Defining%20data%20science%20in%202018&amp;summary=Defining%20data%20science%20in%202018&amp;source=https%3a%2f%2fyanirseroussi.com%2f2018%2f07%2f22%2fdefining-data-science-in-2018%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Defining data science in 2018 on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2018%2f07%2f22%2fdefining-data-science-in-2018%2f&title=Defining%20data%20science%20in%202018"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Defining data science in 2018 on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2018%2f07%2f22%2fdefining-data-science-in-2018%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Defining data science in 2018 on whatsapp" href="https://api.whatsapp.com/send?text=Defining%20data%20science%20in%202018%20-%20https%3a%2f%2fyanirseroussi.com%2f2018%2f07%2f22%2fdefining-data-science-in-2018%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Defining data science in 2018 on telegram" href="https://telegram.me/share/url?text=Defining%20data%20science%20in%202018&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f07%2f22%2fdefining-data-science-in-2018%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Defining data science in 2018 on ycombinator" href="https://news.ycombinator.com/submitlink?t=Defining%20data%20science%20in%202018&u=https%3a%2f%2fyanirseroussi.com%2f2018%2f07%2f22%2fdefining-data-science-in-2018%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2018/11/03/reflections-on-remote-data-science-work/index.html b/2018/11/03/reflections-on-remote-data-science-work/index.html
index 4c8b8b8eb..cab29fe09 100644
--- a/2018/11/03/reflections-on-remote-data-science-work/index.html
+++ b/2018/11/03/reflections-on-remote-data-science-work/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Reflections on remote data science work | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="Automattic,career,data science,remote work,WordPress"><meta name=description content="Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Reflections on remote data science work"><meta property="og:description" content="Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/"><meta property="og:image" content="https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2018-11-03T06:33:13+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach.jpg"><meta name=twitter:title content="Reflections on remote data science work"><meta name=twitter:description content="Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Reflections on remote data science work","item":"https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Reflections on remote data science work","name":"Reflections on remote data science work","description":"Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist.","keywords":["Automattic","career","data science","remote work","WordPress"],"articleBody":"It’s been about a year and a half since I joined Automattic as a remote data scientist. This is the longest I’ve been in one position since finishing my PhD in 2012. This is also the first time I’ve worked full-time with a fully-distributed team. In this post, I briefly discuss some of the top pluses and minuses of remote work, based on my experience so far.\n+ Flexible hours\n– Potentially boundless work By far, one of the top perks of remote work with a distributed team is truly flexible hours. I only have one or two synchronous meetings a week, and in the rest of my time I'm free to work the hours I prefer. No one expects me to be online at specific times, as long as the work gets done and I respond to pings within a reasonable time. As I'm a morning person, this means that I typically work a few hours in the early morning, take a long break (e.g., to surf or run some errands), and then work a few more hours in the afternoon or early evening. The potential downside of such flexibility is not being able to stop working, especially as most of my colleagues are in Europe and North America. I deal with this by avoiding all work communications during my designated non-work hours. For example, I don't have any work-related apps on my phone, I keep all my work tabs in a separate tab group, and I turn Slack off when I'm not working. I found that this approach sets enough of a boundary between my work and personal life, though I do end up thinking about work problems outside work hours occasionally. + More time for non-work activities\n– There’s never enough time! Not commuting freed up the equivalent of a workday in my schedule. In addition, having flexible hours means that I can make time in the middle of the day for leisure activities like surfing and diving. However, it's still a full-time job, so I'm not completely free to pursue non-work activities. It often feels like there isn't enough time in the day, as I can always think of more stuff I'd like to do. But my current situation is much better than having to commute on a daily basis. Even though it's been a relatively short time, I find the idea of going back to full-time office work hard to imagine. + No need to attend an office\n– Possible isolation from colleagues (and the real world) Offices – especially open-plan offices – are not great places to get work done. This is definitely the case with work that requires a high level of concentration over uninterrupted blocks of time, like coding and data analysis. Working from home is great for avoiding distractions – there's no need for silly horse blinders here (though I do enjoy looking at the bird and lizard action outside my window). One good thing about offices is the physical availability of colleagues. It's easy to ask others for feedback, socialise over drinks or shared meals, and keep up to date with company politics. Automattic works around the lack of daily physical interaction by running a few meetups a year. The number of people attending a meetup can vary from a handful for team meetups, to hundreds for the annual Grand Meetup. In all cases, the idea is to bring employees together for up to a week at a time to work and socialise. In my experience, the everyday distance creates a craving to attend meetups. I've never worked in a place where co-workers were so enthusiastic about spending so much time together – with non-distributed companies, team building is often seen as a chore. I suppose that the physical distance makes us appreciate the opportunity to be together and make the most of this precious time – it's a bit like being in a long-distance relationship. That said, in the majority of the time, isolation can be a problem. As I'm based in Australia, I probably feel it more than others – most of my teammates are offline during my work hours, which means that there's no one to chat with on Slack. This isn't a huge issue, but I do need to ensure I get enough social interaction through other avenues. As the jobs page of Bandcamp (another distributed company) used to say: \"If you do not have a strong social structure outside of work then employment at Bandcamp will likely lead to heart disease and an early death. We’re hiring!\" + Most communication is written\n– Information overload As Automattic is a fully-distributed company, most of the communication is done in writing. The main tools are Slack and internal forums called P2s (emails are rarely used). This makes catching up on the latest company news easy in comparison to places that rely more heavily on synchronous meetings. The downside of so much written communication is potential information overload. It is impossible to follow all the P2 posts, and even keeping up with stuff I should know can sometimes be overwhelming. I especially feel it in the mornings, as most of my colleagues work while I'm sleeping. Therefore, catching up on everything that happened overnight and responding to pings often takes over an hour – things are rarely as I left them when I last logged off. I experience this same feeling of being overwhelmed when coming back from vacation. Depending on the length of time away, it can take days to catch up. On the plus side, this process doesn't rely on someone filling me in – it's all there for me to read. + Free trips around the world\n– Jet lag and flying As noted above, Automatticians meet in person a few times a year. Since joining, I attended meetups in Montreal, Whistler, Playa del Carmen, Bali, and Orlando. In some cases, I used the opportunity for personal trips near the meetup locations. Such trips can be a lot of fun. However, the obvious downside when travelling from Australia is that getting to meetups usually involves days of jetlag and long flights (e.g., the 17-hour Dallas to Sydney trip). Nonetheless, I still enjoy the travel opportunities. For example, I doubt I would have ever visited Florida and snorkelled with manatees if it wasn't for Automattic. + Exposure to diverse opinions and people\n– Cultural differences can pose challenges Australia's population is made up of many migrants, especially in the tech industry. However, all such migrants have some familiarity with Australian culture and values. The composition of Automattic's workforce is even more diverse, and it lacks the unifying factor of everyone choosing to live in the same place. This is mostly positive, as I find the exposure to a diverse set of people interesting, and everyone tends to be friendly, welcoming, and focused on the work rather than on cultural differences. However, it's important to be aware of differences in communication styles. There's also a wider range of cultural sensitivities than when working with a more homogeneous group. Still, I haven't found it to be much of an issue, possibly because I'm already used to being a migrant. For example, moving to Australia from Israel required some adjustment of my communication style to be less direct. Closing words Overall, I like working with Automattic. For me, the positives outweigh the negatives, as evidenced by the fact that it’s the longest I’ve been in one position since 2012. Doing remote data science work doesn’t seem particularly different to doing any other sort of non-physical work remotely. I hope that more companies will join Automattic and the growing list of remote companies, and offer their employees the option to work from wherever they’re most productive.\nUpdate (March 2019): I also covered similar topics in a Data Science Sydney talk about a day in the life of a remote data scientist.\n","wordCount":"1321","inLanguage":"en","image":"https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach.jpg","datePublished":"2018-11-03T06:33:13Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Reflections on remote data science work</h1><div class=post-meta><span title='2018-11-03 06:33:13 +0000 UTC'>November 3, 2018</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach_hu1628e10df9028ac19609d5d417782f78_974371_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach_hu1628e10df9028ac19609d5d417782f78_974371_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach_hu1628e10df9028ac19609d5d417782f78_974371_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach_hu1628e10df9028ac19609d5d417782f78_974371_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach_hu1628e10df9028ac19609d5d417782f78_974371_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach.jpg 3998w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach.jpg alt width=3998 height=2143></figure><div class=post-content><p>It&rsquo;s been about a year and a half since <a href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/>I joined Automattic as a remote data scientist</a>. This is the longest I&rsquo;ve been in one position since finishing my PhD in 2012. This is also the first time I&rsquo;ve worked full-time with a fully-distributed team. In this post, I briefly discuss some of the top pluses and minuses of remote work, based on my experience so far.</p><h2 id=-flexible-hoursbr-potentially-boundless-work>+ Flexible hours<br>– Potentially boundless work<a hidden class=anchor aria-hidden=true href=#-flexible-hoursbr-potentially-boundless-work>#</a></h2><p class=indent-1>By far, one of the top perks of remote work with a distributed team is truly flexible hours. I only have one or two synchronous meetings a week, and in the rest of my time I'm free to work the hours I prefer. No one expects me to be online at specific times, as long as the work gets done and I respond to pings within a reasonable time. As I'm a morning person, this means that I typically work a few hours in the early morning, take a long break (e.g., to surf or run some errands), and then work a few more hours in the afternoon or early evening.</p><p class=indent-1>The potential downside of such flexibility is not being able to stop working, especially as most of my colleagues are in Europe and North America. I deal with this by avoiding all work communications during my designated non-work hours. For example, I don't have any work-related apps on my phone, I keep all my work tabs in a separate tab group, and I turn Slack off when I'm not working. I found that this approach sets enough of a boundary between my work and personal life, though I do end up thinking about work problems outside work hours occasionally.</p><h2 id=-more-time-for-non-work-activitiesbr-theres-never-enough-time>+ More time for non-work activities<br>– There&rsquo;s never enough time!<a hidden class=anchor aria-hidden=true href=#-more-time-for-non-work-activitiesbr-theres-never-enough-time>#</a></h2><p class=indent-1>Not commuting freed up the equivalent of a workday in my schedule. In addition, having flexible hours means that I can make time in the middle of the day for leisure activities like surfing and diving. However, it's still a full-time job, so I'm not completely free to pursue non-work activities. It often feels like there isn't enough time in the day, as I can always think of more stuff I'd like to do. But my current situation is much better than having to commute on a daily basis. Even though it's been a relatively short time, I find the idea of going back to full-time office work hard to imagine.</p><h2 id=-no-need-to-attend-an-officebr-possible-isolation-from-colleagues-and-the-real-world>+ No need to attend an office<br>– Possible isolation from colleagues (and the real world)<a hidden class=anchor aria-hidden=true href=#-no-need-to-attend-an-officebr-possible-isolation-from-colleagues-and-the-real-world>#</a></h2><p class=indent-1>Offices &ndash; especially open-plan offices &ndash; are not great places to get work done. This is definitely the case with work that requires a high level of concentration over uninterrupted blocks of time, like coding and data analysis. Working from home is great for avoiding distractions &ndash; there's no need for <a href=https://techcrunch.com/2018/10/17/open-offices-have-driven-panasonic-to-make-horse-blinders-for-humans/>silly horse blinders</a> here (though I do enjoy looking at the bird and lizard action outside my window).</p><p class=indent-1>One good thing about offices is the physical availability of colleagues. It's easy to ask others for feedback, socialise over drinks or shared meals, and keep up to date with company politics. Automattic works around the lack of daily physical interaction by running a few meetups a year. The number of people attending a meetup can vary from a handful for team meetups, to hundreds for the annual Grand Meetup. In all cases, the idea is to bring employees together for up to a week at a time to work and socialise. In my experience, the everyday distance creates a craving to attend meetups. I've never worked in a place where co-workers were so enthusiastic about spending so much time together &ndash; with non-distributed companies, team building is often seen as a chore. I suppose that the physical distance makes us appreciate the opportunity to be together and make the most of this precious time &ndash; it's a bit like being in a long-distance relationship.</p><p class=indent-1>That said, in the majority of the time, isolation can be a problem. As I'm based in Australia, I probably feel it more than others &ndash; most of my teammates are offline during my work hours, which means that there's no one to chat with on Slack. This isn't a huge issue, but I do need to ensure I get enough social interaction through other avenues. As <a href=https://web.archive.org/web/20160102094215/Bandcamp.com/jobs>the jobs page of Bandcamp (another distributed company) used to say</a>: <i>"If you do not have a strong social structure outside of work then employment at Bandcamp will likely lead to heart disease and an early death. We’re hiring!"</i></p><h2 id=-most-communication-is-writtenbr-information-overload>+ Most communication is written<br>– Information overload<a hidden class=anchor aria-hidden=true href=#-most-communication-is-writtenbr-information-overload>#</a></h2><p class=indent-1>As Automattic is a fully-distributed company, most of the communication is done in writing. The main tools are Slack and internal forums called P2s (emails are rarely used). This makes catching up on the latest company news easy in comparison to places that rely more heavily on synchronous meetings. The downside of so much written communication is potential information overload. It is impossible to follow all the P2 posts, and even keeping up with stuff I <i>should</i> know can sometimes be overwhelming. I especially feel it in the mornings, as most of my colleagues work while I'm sleeping. Therefore, catching up on everything that happened overnight and responding to pings often takes over an hour &ndash; things are rarely as I left them when I last logged off. I experience this same feeling of being overwhelmed when coming back from vacation. Depending on the length of time away, it can take days to catch up. On the plus side, this process doesn't rely on someone filling me in &ndash; it's all there for me to read.</p><h2 id=-free-trips-around-the-worldbr-jet-lag-and-flying>+ Free trips around the world<br>– Jet lag and flying<a hidden class=anchor aria-hidden=true href=#-free-trips-around-the-worldbr-jet-lag-and-flying>#</a></h2><p class=indent-1>As noted above, Automatticians meet in person a few times a year. Since joining, I attended meetups in Montreal, Whistler, Playa del Carmen, Bali, and Orlando. In some cases, I used the opportunity for personal trips near the meetup locations. Such trips can be a lot of fun. However, the obvious downside when travelling from Australia is that getting to meetups usually involves days of jetlag and long flights (e.g., the 17-hour Dallas to Sydney trip). Nonetheless, I still enjoy the travel opportunities. For example, I doubt I would have ever visited Florida and snorkelled with manatees if it wasn't for Automattic.</p><h2 id=-exposure-to-diverse-opinions-and-peoplebr-cultural-differences-can-pose-challenges>+ Exposure to diverse opinions and people<br>– Cultural differences can pose challenges<a hidden class=anchor aria-hidden=true href=#-exposure-to-diverse-opinions-and-peoplebr-cultural-differences-can-pose-challenges>#</a></h2><p class=indent-1>Australia's population is made up of many migrants, especially in the tech industry. However, all such migrants have some familiarity with Australian culture and values. The composition of Automattic's workforce is even more diverse, and it lacks the unifying factor of everyone choosing to live in the same place. This is mostly positive, as I find the exposure to a diverse set of people interesting, and everyone tends to be friendly, welcoming, and focused on the work rather than on cultural differences. However, it's important to be aware of differences in communication styles. There's also a wider range of cultural sensitivities than when working with a more homogeneous group. Still, I haven't found it to be much of an issue, possibly because I'm already used to being a migrant. For example, moving to Australia from Israel required some adjustment of my communication style to be less direct.</p><h2 id=closing-words>Closing words<a hidden class=anchor aria-hidden=true href=#closing-words>#</a></h2><p>Overall, I like working with Automattic. For me, the positives outweigh the negatives, as evidenced by the fact that it&rsquo;s the longest I&rsquo;ve been in one position since 2012. Doing remote data science work doesn&rsquo;t seem particularly different to doing any other sort of non-physical work remotely. I hope that more companies will join Automattic and <a href=https://github.com/yanirs/established-remote>the growing list of remote companies</a>, and offer their employees the option to work from wherever they&rsquo;re most productive.</p><p><strong>Update (March 2019):</strong> I also covered similar topics in a Data Science Sydney talk about <a href="https://www.youtube.com/watch?v=5qbVEEtgWcY">a day in the life of a remote data scientist</a>.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/automattic/>Automattic</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/remote-work/>Remote Work</a></li><li><a href=https://yanirseroussi.com/tags/wordpress/>WordPress</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Reflections on remote data science work on x" href="https://x.com/intent/tweet/?text=Reflections%20on%20remote%20data%20science%20work&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f11%2f03%2freflections-on-remote-data-science-work%2f&amp;hashtags=Automattic%2ccareer%2cdatascience%2cremotework%2cWordPress"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Reflections on remote data science work on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f11%2f03%2freflections-on-remote-data-science-work%2f&amp;title=Reflections%20on%20remote%20data%20science%20work&amp;summary=Reflections%20on%20remote%20data%20science%20work&amp;source=https%3a%2f%2fyanirseroussi.com%2f2018%2f11%2f03%2freflections-on-remote-data-science-work%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Reflections on remote data science work on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2018%2f11%2f03%2freflections-on-remote-data-science-work%2f&title=Reflections%20on%20remote%20data%20science%20work"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Reflections on remote data science work on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2018%2f11%2f03%2freflections-on-remote-data-science-work%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Reflections on remote data science work on whatsapp" href="https://api.whatsapp.com/send?text=Reflections%20on%20remote%20data%20science%20work%20-%20https%3a%2f%2fyanirseroussi.com%2f2018%2f11%2f03%2freflections-on-remote-data-science-work%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Reflections on remote data science work on telegram" href="https://telegram.me/share/url?text=Reflections%20on%20remote%20data%20science%20work&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f11%2f03%2freflections-on-remote-data-science-work%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Reflections on remote data science work on ycombinator" href="https://news.ycombinator.com/submitlink?t=Reflections%20on%20remote%20data%20science%20work&u=https%3a%2f%2fyanirseroussi.com%2f2018%2f11%2f03%2freflections-on-remote-data-science-work%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="Automattic,career,data science,remote work,WordPress"><meta name=description content="Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Reflections on remote data science work"><meta property="og:description" content="Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/"><meta property="og:image" content="https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2018-11-03T06:33:13+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach.jpg"><meta name=twitter:title content="Reflections on remote data science work"><meta name=twitter:description content="Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Reflections on remote data science work","item":"https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Reflections on remote data science work","name":"Reflections on remote data science work","description":"Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist.","keywords":["Automattic","career","data science","remote work","WordPress"],"articleBody":"It’s been about a year and a half since I joined Automattic as a remote data scientist. This is the longest I’ve been in one position since finishing my PhD in 2012. This is also the first time I’ve worked full-time with a fully-distributed team. In this post, I briefly discuss some of the top pluses and minuses of remote work, based on my experience so far.\n+ Flexible hours\n– Potentially boundless work By far, one of the top perks of remote work with a distributed team is truly flexible hours. I only have one or two synchronous meetings a week, and in the rest of my time I'm free to work the hours I prefer. No one expects me to be online at specific times, as long as the work gets done and I respond to pings within a reasonable time. As I'm a morning person, this means that I typically work a few hours in the early morning, take a long break (e.g., to surf or run some errands), and then work a few more hours in the afternoon or early evening. The potential downside of such flexibility is not being able to stop working, especially as most of my colleagues are in Europe and North America. I deal with this by avoiding all work communications during my designated non-work hours. For example, I don't have any work-related apps on my phone, I keep all my work tabs in a separate tab group, and I turn Slack off when I'm not working. I found that this approach sets enough of a boundary between my work and personal life, though I do end up thinking about work problems outside work hours occasionally. + More time for non-work activities\n– There’s never enough time! Not commuting freed up the equivalent of a workday in my schedule. In addition, having flexible hours means that I can make time in the middle of the day for leisure activities like surfing and diving. However, it's still a full-time job, so I'm not completely free to pursue non-work activities. It often feels like there isn't enough time in the day, as I can always think of more stuff I'd like to do. But my current situation is much better than having to commute on a daily basis. Even though it's been a relatively short time, I find the idea of going back to full-time office work hard to imagine. + No need to attend an office\n– Possible isolation from colleagues (and the real world) Offices – especially open-plan offices – are not great places to get work done. This is definitely the case with work that requires a high level of concentration over uninterrupted blocks of time, like coding and data analysis. Working from home is great for avoiding distractions – there's no need for silly horse blinders here (though I do enjoy looking at the bird and lizard action outside my window). One good thing about offices is the physical availability of colleagues. It's easy to ask others for feedback, socialise over drinks or shared meals, and keep up to date with company politics. Automattic works around the lack of daily physical interaction by running a few meetups a year. The number of people attending a meetup can vary from a handful for team meetups, to hundreds for the annual Grand Meetup. In all cases, the idea is to bring employees together for up to a week at a time to work and socialise. In my experience, the everyday distance creates a craving to attend meetups. I've never worked in a place where co-workers were so enthusiastic about spending so much time together – with non-distributed companies, team building is often seen as a chore. I suppose that the physical distance makes us appreciate the opportunity to be together and make the most of this precious time – it's a bit like being in a long-distance relationship. That said, in the majority of the time, isolation can be a problem. As I'm based in Australia, I probably feel it more than others – most of my teammates are offline during my work hours, which means that there's no one to chat with on Slack. This isn't a huge issue, but I do need to ensure I get enough social interaction through other avenues. As the jobs page of Bandcamp (another distributed company) used to say: \"If you do not have a strong social structure outside of work then employment at Bandcamp will likely lead to heart disease and an early death. We’re hiring!\" + Most communication is written\n– Information overload As Automattic is a fully-distributed company, most of the communication is done in writing. The main tools are Slack and internal forums called P2s (emails are rarely used). This makes catching up on the latest company news easy in comparison to places that rely more heavily on synchronous meetings. The downside of so much written communication is potential information overload. It is impossible to follow all the P2 posts, and even keeping up with stuff I should know can sometimes be overwhelming. I especially feel it in the mornings, as most of my colleagues work while I'm sleeping. Therefore, catching up on everything that happened overnight and responding to pings often takes over an hour – things are rarely as I left them when I last logged off. I experience this same feeling of being overwhelmed when coming back from vacation. Depending on the length of time away, it can take days to catch up. On the plus side, this process doesn't rely on someone filling me in – it's all there for me to read. + Free trips around the world\n– Jet lag and flying As noted above, Automatticians meet in person a few times a year. Since joining, I attended meetups in Montreal, Whistler, Playa del Carmen, Bali, and Orlando. In some cases, I used the opportunity for personal trips near the meetup locations. Such trips can be a lot of fun. However, the obvious downside when travelling from Australia is that getting to meetups usually involves days of jetlag and long flights (e.g., the 17-hour Dallas to Sydney trip). Nonetheless, I still enjoy the travel opportunities. For example, I doubt I would have ever visited Florida and snorkelled with manatees if it wasn't for Automattic. + Exposure to diverse opinions and people\n– Cultural differences can pose challenges Australia's population is made up of many migrants, especially in the tech industry. However, all such migrants have some familiarity with Australian culture and values. The composition of Automattic's workforce is even more diverse, and it lacks the unifying factor of everyone choosing to live in the same place. This is mostly positive, as I find the exposure to a diverse set of people interesting, and everyone tends to be friendly, welcoming, and focused on the work rather than on cultural differences. However, it's important to be aware of differences in communication styles. There's also a wider range of cultural sensitivities than when working with a more homogeneous group. Still, I haven't found it to be much of an issue, possibly because I'm already used to being a migrant. For example, moving to Australia from Israel required some adjustment of my communication style to be less direct. Closing words Overall, I like working with Automattic. For me, the positives outweigh the negatives, as evidenced by the fact that it’s the longest I’ve been in one position since 2012. Doing remote data science work doesn’t seem particularly different to doing any other sort of non-physical work remotely. I hope that more companies will join Automattic and the growing list of remote companies, and offer their employees the option to work from wherever they’re most productive.\nUpdate (March 2019): I also covered similar topics in a Data Science Sydney talk about a day in the life of a remote data scientist.\n","wordCount":"1321","inLanguage":"en","image":"https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach.jpg","datePublished":"2018-11-03T06:33:13Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Reflections on remote data science work</h1><div class=post-meta><span title='2018-11-03 06:33:13 +0000 UTC'>November 3, 2018</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach_hu1628e10df9028ac19609d5d417782f78_974371_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach_hu1628e10df9028ac19609d5d417782f78_974371_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach_hu1628e10df9028ac19609d5d417782f78_974371_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach_hu1628e10df9028ac19609d5d417782f78_974371_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach_hu1628e10df9028ac19609d5d417782f78_974371_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach.jpg 3998w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/angels-beach.jpg alt width=3998 height=2143></figure><div class=post-content><p>It&rsquo;s been about a year and a half since <a href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/>I joined Automattic as a remote data scientist</a>. This is the longest I&rsquo;ve been in one position since finishing my PhD in 2012. This is also the first time I&rsquo;ve worked full-time with a fully-distributed team. In this post, I briefly discuss some of the top pluses and minuses of remote work, based on my experience so far.</p><h2 id=-flexible-hoursbr-potentially-boundless-work>+ Flexible hours<br>– Potentially boundless work<a hidden class=anchor aria-hidden=true href=#-flexible-hoursbr-potentially-boundless-work>#</a></h2><p class=indent-1>By far, one of the top perks of remote work with a distributed team is truly flexible hours. I only have one or two synchronous meetings a week, and in the rest of my time I'm free to work the hours I prefer. No one expects me to be online at specific times, as long as the work gets done and I respond to pings within a reasonable time. As I'm a morning person, this means that I typically work a few hours in the early morning, take a long break (e.g., to surf or run some errands), and then work a few more hours in the afternoon or early evening.</p><p class=indent-1>The potential downside of such flexibility is not being able to stop working, especially as most of my colleagues are in Europe and North America. I deal with this by avoiding all work communications during my designated non-work hours. For example, I don't have any work-related apps on my phone, I keep all my work tabs in a separate tab group, and I turn Slack off when I'm not working. I found that this approach sets enough of a boundary between my work and personal life, though I do end up thinking about work problems outside work hours occasionally.</p><h2 id=-more-time-for-non-work-activitiesbr-theres-never-enough-time>+ More time for non-work activities<br>– There&rsquo;s never enough time!<a hidden class=anchor aria-hidden=true href=#-more-time-for-non-work-activitiesbr-theres-never-enough-time>#</a></h2><p class=indent-1>Not commuting freed up the equivalent of a workday in my schedule. In addition, having flexible hours means that I can make time in the middle of the day for leisure activities like surfing and diving. However, it's still a full-time job, so I'm not completely free to pursue non-work activities. It often feels like there isn't enough time in the day, as I can always think of more stuff I'd like to do. But my current situation is much better than having to commute on a daily basis. Even though it's been a relatively short time, I find the idea of going back to full-time office work hard to imagine.</p><h2 id=-no-need-to-attend-an-officebr-possible-isolation-from-colleagues-and-the-real-world>+ No need to attend an office<br>– Possible isolation from colleagues (and the real world)<a hidden class=anchor aria-hidden=true href=#-no-need-to-attend-an-officebr-possible-isolation-from-colleagues-and-the-real-world>#</a></h2><p class=indent-1>Offices &ndash; especially open-plan offices &ndash; are not great places to get work done. This is definitely the case with work that requires a high level of concentration over uninterrupted blocks of time, like coding and data analysis. Working from home is great for avoiding distractions &ndash; there's no need for <a href=https://techcrunch.com/2018/10/17/open-offices-have-driven-panasonic-to-make-horse-blinders-for-humans/>silly horse blinders</a> here (though I do enjoy looking at the bird and lizard action outside my window).</p><p class=indent-1>One good thing about offices is the physical availability of colleagues. It's easy to ask others for feedback, socialise over drinks or shared meals, and keep up to date with company politics. Automattic works around the lack of daily physical interaction by running a few meetups a year. The number of people attending a meetup can vary from a handful for team meetups, to hundreds for the annual Grand Meetup. In all cases, the idea is to bring employees together for up to a week at a time to work and socialise. In my experience, the everyday distance creates a craving to attend meetups. I've never worked in a place where co-workers were so enthusiastic about spending so much time together &ndash; with non-distributed companies, team building is often seen as a chore. I suppose that the physical distance makes us appreciate the opportunity to be together and make the most of this precious time &ndash; it's a bit like being in a long-distance relationship.</p><p class=indent-1>That said, in the majority of the time, isolation can be a problem. As I'm based in Australia, I probably feel it more than others &ndash; most of my teammates are offline during my work hours, which means that there's no one to chat with on Slack. This isn't a huge issue, but I do need to ensure I get enough social interaction through other avenues. As <a href=https://web.archive.org/web/20160102094215/Bandcamp.com/jobs>the jobs page of Bandcamp (another distributed company) used to say</a>: <i>"If you do not have a strong social structure outside of work then employment at Bandcamp will likely lead to heart disease and an early death. We’re hiring!"</i></p><h2 id=-most-communication-is-writtenbr-information-overload>+ Most communication is written<br>– Information overload<a hidden class=anchor aria-hidden=true href=#-most-communication-is-writtenbr-information-overload>#</a></h2><p class=indent-1>As Automattic is a fully-distributed company, most of the communication is done in writing. The main tools are Slack and internal forums called P2s (emails are rarely used). This makes catching up on the latest company news easy in comparison to places that rely more heavily on synchronous meetings. The downside of so much written communication is potential information overload. It is impossible to follow all the P2 posts, and even keeping up with stuff I <i>should</i> know can sometimes be overwhelming. I especially feel it in the mornings, as most of my colleagues work while I'm sleeping. Therefore, catching up on everything that happened overnight and responding to pings often takes over an hour &ndash; things are rarely as I left them when I last logged off. I experience this same feeling of being overwhelmed when coming back from vacation. Depending on the length of time away, it can take days to catch up. On the plus side, this process doesn't rely on someone filling me in &ndash; it's all there for me to read.</p><h2 id=-free-trips-around-the-worldbr-jet-lag-and-flying>+ Free trips around the world<br>– Jet lag and flying<a hidden class=anchor aria-hidden=true href=#-free-trips-around-the-worldbr-jet-lag-and-flying>#</a></h2><p class=indent-1>As noted above, Automatticians meet in person a few times a year. Since joining, I attended meetups in Montreal, Whistler, Playa del Carmen, Bali, and Orlando. In some cases, I used the opportunity for personal trips near the meetup locations. Such trips can be a lot of fun. However, the obvious downside when travelling from Australia is that getting to meetups usually involves days of jetlag and long flights (e.g., the 17-hour Dallas to Sydney trip). Nonetheless, I still enjoy the travel opportunities. For example, I doubt I would have ever visited Florida and snorkelled with manatees if it wasn't for Automattic.</p><h2 id=-exposure-to-diverse-opinions-and-peoplebr-cultural-differences-can-pose-challenges>+ Exposure to diverse opinions and people<br>– Cultural differences can pose challenges<a hidden class=anchor aria-hidden=true href=#-exposure-to-diverse-opinions-and-peoplebr-cultural-differences-can-pose-challenges>#</a></h2><p class=indent-1>Australia's population is made up of many migrants, especially in the tech industry. However, all such migrants have some familiarity with Australian culture and values. The composition of Automattic's workforce is even more diverse, and it lacks the unifying factor of everyone choosing to live in the same place. This is mostly positive, as I find the exposure to a diverse set of people interesting, and everyone tends to be friendly, welcoming, and focused on the work rather than on cultural differences. However, it's important to be aware of differences in communication styles. There's also a wider range of cultural sensitivities than when working with a more homogeneous group. Still, I haven't found it to be much of an issue, possibly because I'm already used to being a migrant. For example, moving to Australia from Israel required some adjustment of my communication style to be less direct.</p><h2 id=closing-words>Closing words<a hidden class=anchor aria-hidden=true href=#closing-words>#</a></h2><p>Overall, I like working with Automattic. For me, the positives outweigh the negatives, as evidenced by the fact that it&rsquo;s the longest I&rsquo;ve been in one position since 2012. Doing remote data science work doesn&rsquo;t seem particularly different to doing any other sort of non-physical work remotely. I hope that more companies will join Automattic and <a href=https://github.com/yanirs/established-remote>the growing list of remote companies</a>, and offer their employees the option to work from wherever they&rsquo;re most productive.</p><p><strong>Update (March 2019):</strong> I also covered similar topics in a Data Science Sydney talk about <a href="https://www.youtube.com/watch?v=5qbVEEtgWcY">a day in the life of a remote data scientist</a>.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/automattic/>Automattic</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/remote-work/>Remote Work</a></li><li><a href=https://yanirseroussi.com/tags/wordpress/>WordPress</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Reflections on remote data science work on x" href="https://x.com/intent/tweet/?text=Reflections%20on%20remote%20data%20science%20work&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f11%2f03%2freflections-on-remote-data-science-work%2f&amp;hashtags=Automattic%2ccareer%2cdatascience%2cremotework%2cWordPress"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Reflections on remote data science work on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f11%2f03%2freflections-on-remote-data-science-work%2f&amp;title=Reflections%20on%20remote%20data%20science%20work&amp;summary=Reflections%20on%20remote%20data%20science%20work&amp;source=https%3a%2f%2fyanirseroussi.com%2f2018%2f11%2f03%2freflections-on-remote-data-science-work%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Reflections on remote data science work on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2018%2f11%2f03%2freflections-on-remote-data-science-work%2f&title=Reflections%20on%20remote%20data%20science%20work"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Reflections on remote data science work on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2018%2f11%2f03%2freflections-on-remote-data-science-work%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Reflections on remote data science work on whatsapp" href="https://api.whatsapp.com/send?text=Reflections%20on%20remote%20data%20science%20work%20-%20https%3a%2f%2fyanirseroussi.com%2f2018%2f11%2f03%2freflections-on-remote-data-science-work%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Reflections on remote data science work on telegram" href="https://telegram.me/share/url?text=Reflections%20on%20remote%20data%20science%20work&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f11%2f03%2freflections-on-remote-data-science-work%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Reflections on remote data science work on ycombinator" href="https://news.ycombinator.com/submitlink?t=Reflections%20on%20remote%20data%20science%20work&u=https%3a%2f%2fyanirseroussi.com%2f2018%2f11%2f03%2freflections-on-remote-data-science-work%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/index.html b/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/index.html
index e4592210a..9c3675756 100644
--- a/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/index.html
+++ b/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>The most practical causal inference book I’ve read (is still a draft) | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="causal inference,data science,statistics"><meta name=description content="Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The most practical causal inference book I’ve read (is still a draft)"><meta property="og:description" content="Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/"><meta property="og:image" content="https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2018-12-24T02:37:50+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost.jpg"><meta name=twitter:title content="The most practical causal inference book I’ve read (is still a draft)"><meta name=twitter:description content="Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"The most practical causal inference book I’ve read (is still a draft)","item":"https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The most practical causal inference book I’ve read (is still a draft)","name":"The most practical causal inference book I’ve read (is still a draft)","description":"Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area.","keywords":["causal inference","data science","statistics"],"articleBody":"I’ve been interested in the area of causal inference in the past few years. In my opinion it’s more exciting and relevant to everyday life than more hyped data science areas like deep learning. However, I’ve found it hard to apply what I’ve learned about causal inference to my work. Now, I believe I’ve finally found a book with practical techniques that I can use on real problems: Causal Inference by Miguel Hernán and Jamie Robins. It is available for free from their site, but is still in draft mode. This post is a short summary of the reasons why I think Causal Inference is a great practical resource.\nOne of the things that sets Causal Inference apart from other books on the topic is the background of its authors. Hernán and Robins are both epidemiologists, which means they often have to deal with data with strong limitations on sample size and feasibility of experiments. Decisions driven by causal inference in epidemiology can often make the difference between life and death of individuals. Hence, the book is full of practical examples.\nThe book focuses on randomised controlled trials and well-defined interventions as the basis of causal inference from both experimental and observational data. As the authors show, even with randomised experiments, the analysis often requires using observational causal inference tools due to factors like selection and measurement biases. Their insistence on well-defined interventions is particularly refreshing, as one of the things that bothers me about the writings of Judea Pearl (a prominent researcher of causal inference) is the vagueness of statements like “smoking causes cancer” and “mud doesn’t cause rain”. The need for well-defined interventions was summarised by Hernán in the article Does water kill? A call for less casual causal inferences.\nUnlike some other resources, Causal Inference doesn’t appear to be too dogmatic about the framework used for modelling causality. I’m not an expert on where each idea originated, but it seems like the authors mix elements from the potential outcomes framework and from Pearl’s graphical models. They also don’t neglect time as an important consideration in cause-and-effect relationships. In fact, the third part of the book is dedicated to the topic of time-varying treatments and effects.\nThe practicality of the book is also demonstrated by the fact that it comes with code examples in multiple languages. In addition, the authors don’t dwell too much on the philosophy of causality. While it is a fascinating topic, the opening paragraphs of the book make its goals clear:\nBy reading this book you are expressing an interest in learning about causal inference. But, as a human being, you have already mastered the fundamental concepts of causal inference. You certainly know what a causal effect is; you clearly understand the difference between association and causation; and you have used this knowledge constantly throughout your life. In fact, had you not understood these causal concepts, you would have not survived long enough to read this chapter–or even to learn to read. As a toddler you would have jumped right into the swimming pool after observing that those who did so were later able to reach the jam jar. As a teenager, you would have skied down the most dangerous slopes after observing that those who did so were more likely to win the next ski race. As a parent, you would have refused to give antibiotics to your sick child after observing that those children who took their medicines were less likely to be playing in the park the next day.\nSince you already understand the definition of causal effect and the difference between association and causation, do not expect to gain deep conceptual insights from this chapter. Rather, the purpose of this chapter is to introduce mathematical notation that formalizes the causal intuition that you already possess. Make sure that you can match your causal intuition with the mathematical notation introduced here. This notation is necessary to precisely define causal concepts, and we will use it throughout the book.\nI won’t try to summarise the technical aspects of the book – partly because I don’t fully understand it all, and partly because the book itself is already a summary of a very rich research area. However, I’m likely to go back and reread the book in the future, with the goal of applying the techniques from the book to my work. I’d also like to take Hernán’s causal inference course as a way of practising what I’ve learned from the book. For people who want a non-technical summary of the topics covered by the book, I recommend the article The c-word: Scientific euphemisms do not improve causal inference from observational data. If you’re curious about other (less practical) causality books I’ve read, check out my causal inference resource list and my two previous posts on the topic: Why you should stop worrying about deep learning and deepen your understanding of causality instead and Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions.\n","wordCount":"831","inLanguage":"en","image":"https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost.jpg","datePublished":"2018-12-24T02:37:50Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">The most practical causal inference book I’ve read (is still a draft)</h1><div class=post-meta><span title='2018-12-24 02:37:50 +0000 UTC'>December 24, 2018</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost_hu60b33a1bef2586fcaccb307cd6388d77_2433611_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost_hu60b33a1bef2586fcaccb307cd6388d77_2433611_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost_hu60b33a1bef2586fcaccb307cd6388d77_2433611_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost_hu60b33a1bef2586fcaccb307cd6388d77_2433611_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost_hu60b33a1bef2586fcaccb307cd6388d77_2433611_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost.jpg 4210w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost.jpg alt width=4210 height=2812></figure><div class=post-content><p>I&rsquo;ve been interested in the area of causal inference in the past few years. In my opinion <a href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/>it&rsquo;s more exciting and relevant to everyday life than more hyped data science areas like deep learning</a>. However, I&rsquo;ve found it hard to apply what I&rsquo;ve learned about causal inference to my work. Now, I believe I&rsquo;ve finally found a book with practical techniques that I can use on real problems: <a href=https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ target=_blank rel=noopener><em>Causal Inference</em></a> by Miguel Hernán and Jamie Robins. It is available for free from their site, but is still in draft mode. This post is a short summary of the reasons why I think <em>Causal Inference</em> is a great practical resource.</p><p>One of the things that sets <em>Causal Inference</em> apart from other books on the topic is the background of its authors. Hernán and Robins are both epidemiologists, which means they often have to deal with data with strong limitations on sample size and feasibility of experiments. Decisions driven by causal inference in epidemiology can often make the difference between life and death of individuals. Hence, the book is full of practical examples.</p><p>The book focuses on randomised controlled trials and well-defined interventions as the basis of causal inference from both experimental and observational data. As the authors show, even with randomised experiments, the analysis often requires using observational causal inference tools due to factors like selection and measurement biases. Their insistence on well-defined interventions is particularly refreshing, as one of the things that bothers me about the writings of Judea Pearl (a prominent researcher of causal inference) is <a href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/>the vagueness of statements like <em>&ldquo;smoking causes cancer&rdquo;</em> and <em>&ldquo;mud doesn&rsquo;t cause rain&rdquo;</em></a>. The need for well-defined interventions was summarised by Hernán in the article <a href=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5207342/ target=_blank rel=noopener><em>Does water kill? A call for less casual causal inferences</em></a>.</p><p>Unlike some other resources, <em>Causal Inference</em> doesn&rsquo;t appear to be too dogmatic about the framework used for modelling causality. I&rsquo;m not an expert on where each idea originated, but it seems like the authors mix elements from the <a href=https://en.wikipedia.org/wiki/Rubin_causal_model target=_blank rel=noopener>potential outcomes framework</a> and from <a href=https://en.wikipedia.org/wiki/Structural_equation_modeling target=_blank rel=noopener>Pearl&rsquo;s graphical models</a>. They also don&rsquo;t neglect time as an important consideration in cause-and-effect relationships. In fact, the third part of the book is dedicated to the topic of time-varying treatments and effects.</p><p>The practicality of the book is also demonstrated by the fact that it comes with code examples in multiple languages. In addition, the authors don&rsquo;t dwell too much on the philosophy of causality. While it is a fascinating topic, the opening paragraphs of the book make its goals clear:</p><blockquote><p>By reading this book you are expressing an interest in learning about causal inference. But, as a human being, you have already mastered the fundamental concepts of causal inference. You certainly know what a causal effect is; you clearly understand the difference between association and causation; and you have used this knowledge constantly throughout your life. In fact, had you not understood these causal concepts, you would have not survived long enough to read this chapter–or even to learn to read. As a toddler you would have jumped right into the swimming pool after observing that those who did so were later able to reach the jam jar. As a teenager, you would have skied down the most dangerous slopes after observing that those who did so were more likely to win the next ski race. As a parent, you would have refused to give antibiotics to your sick child after observing that those children who took their medicines were less likely to be playing in the park the next day.</p><p>Since you already understand the definition of causal effect and the difference between association and causation, do not expect to gain deep conceptual insights from this chapter. Rather, the purpose of this chapter is to introduce mathematical notation that formalizes the causal intuition that you already possess. Make sure that you can match your causal intuition with the mathematical notation introduced here. This notation is necessary to precisely define causal concepts, and we will use it throughout the book.</p></blockquote><p>I won&rsquo;t try to summarise the technical aspects of the book – partly because I don&rsquo;t fully understand it all, and partly because the book itself is already a summary of a very rich research area. However, I&rsquo;m likely to go back and reread the book in the future, with the goal of applying the techniques from the book to my work. I&rsquo;d also like to take <a href=https://www.edx.org/course/causal-diagrams-draw-assumptions-harvardx-ph559x target=_blank rel=noopener>Hernán&rsquo;s causal inference course</a> as a way of practising what I&rsquo;ve learned from the book. For people who want a non-technical summary of the topics covered by the book, I recommend the article <a href=https://ajph.aphapublications.org/doi/10.2105/AJPH.2018.304337 target=_blank rel=noopener><em>The c-word: Scientific euphemisms do not improve causal inference from observational data</em></a>. If you&rsquo;re curious about other (less practical) causality books I&rsquo;ve read, check out <a href=https://yanirseroussi.com/causal-inference-resources/>my causal inference resource list</a> and my two previous posts on the topic: <a href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/><em>Why you should stop worrying about deep learning and deepen your understanding of causality instead</em></a> and <a href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/><em>Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions</em></a>.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/causal-inference/>Causal Inference</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/statistics/>Statistics</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The most practical causal inference book I’ve read (is still a draft) on x" href="https://x.com/intent/tweet/?text=The%20most%20practical%20causal%20inference%20book%20I%e2%80%99ve%20read%20%28is%20still%20a%20draft%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f12%2f24%2fthe-most-practical-causal-inference-book-ive-read-is-still-a-draft%2f&amp;hashtags=causalinference%2cdatascience%2cstatistics"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The most practical causal inference book I’ve read (is still a draft) on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f12%2f24%2fthe-most-practical-causal-inference-book-ive-read-is-still-a-draft%2f&amp;title=The%20most%20practical%20causal%20inference%20book%20I%e2%80%99ve%20read%20%28is%20still%20a%20draft%29&amp;summary=The%20most%20practical%20causal%20inference%20book%20I%e2%80%99ve%20read%20%28is%20still%20a%20draft%29&amp;source=https%3a%2f%2fyanirseroussi.com%2f2018%2f12%2f24%2fthe-most-practical-causal-inference-book-ive-read-is-still-a-draft%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The most practical causal inference book I’ve read (is still a draft) on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2018%2f12%2f24%2fthe-most-practical-causal-inference-book-ive-read-is-still-a-draft%2f&title=The%20most%20practical%20causal%20inference%20book%20I%e2%80%99ve%20read%20%28is%20still%20a%20draft%29"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The most practical causal inference book I’ve read (is still a draft) on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2018%2f12%2f24%2fthe-most-practical-causal-inference-book-ive-read-is-still-a-draft%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The most practical causal inference book I’ve read (is still a draft) on whatsapp" href="https://api.whatsapp.com/send?text=The%20most%20practical%20causal%20inference%20book%20I%e2%80%99ve%20read%20%28is%20still%20a%20draft%29%20-%20https%3a%2f%2fyanirseroussi.com%2f2018%2f12%2f24%2fthe-most-practical-causal-inference-book-ive-read-is-still-a-draft%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The most practical causal inference book I’ve read (is still a draft) on telegram" href="https://telegram.me/share/url?text=The%20most%20practical%20causal%20inference%20book%20I%e2%80%99ve%20read%20%28is%20still%20a%20draft%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f12%2f24%2fthe-most-practical-causal-inference-book-ive-read-is-still-a-draft%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The most practical causal inference book I’ve read (is still a draft) on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20most%20practical%20causal%20inference%20book%20I%e2%80%99ve%20read%20%28is%20still%20a%20draft%29&u=https%3a%2f%2fyanirseroussi.com%2f2018%2f12%2f24%2fthe-most-practical-causal-inference-book-ive-read-is-still-a-draft%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="causal inference,data science,statistics"><meta name=description content="Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The most practical causal inference book I’ve read (is still a draft)"><meta property="og:description" content="Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/"><meta property="og:image" content="https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2018-12-24T02:37:50+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost.jpg"><meta name=twitter:title content="The most practical causal inference book I’ve read (is still a draft)"><meta name=twitter:description content="Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"The most practical causal inference book I’ve read (is still a draft)","item":"https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The most practical causal inference book I’ve read (is still a draft)","name":"The most practical causal inference book I’ve read (is still a draft)","description":"Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area.","keywords":["causal inference","data science","statistics"],"articleBody":"I’ve been interested in the area of causal inference in the past few years. In my opinion it’s more exciting and relevant to everyday life than more hyped data science areas like deep learning. However, I’ve found it hard to apply what I’ve learned about causal inference to my work. Now, I believe I’ve finally found a book with practical techniques that I can use on real problems: Causal Inference by Miguel Hernán and Jamie Robins. It is available for free from their site, but is still in draft mode. This post is a short summary of the reasons why I think Causal Inference is a great practical resource.\nOne of the things that sets Causal Inference apart from other books on the topic is the background of its authors. Hernán and Robins are both epidemiologists, which means they often have to deal with data with strong limitations on sample size and feasibility of experiments. Decisions driven by causal inference in epidemiology can often make the difference between life and death of individuals. Hence, the book is full of practical examples.\nThe book focuses on randomised controlled trials and well-defined interventions as the basis of causal inference from both experimental and observational data. As the authors show, even with randomised experiments, the analysis often requires using observational causal inference tools due to factors like selection and measurement biases. Their insistence on well-defined interventions is particularly refreshing, as one of the things that bothers me about the writings of Judea Pearl (a prominent researcher of causal inference) is the vagueness of statements like “smoking causes cancer” and “mud doesn’t cause rain”. The need for well-defined interventions was summarised by Hernán in the article Does water kill? A call for less casual causal inferences.\nUnlike some other resources, Causal Inference doesn’t appear to be too dogmatic about the framework used for modelling causality. I’m not an expert on where each idea originated, but it seems like the authors mix elements from the potential outcomes framework and from Pearl’s graphical models. They also don’t neglect time as an important consideration in cause-and-effect relationships. In fact, the third part of the book is dedicated to the topic of time-varying treatments and effects.\nThe practicality of the book is also demonstrated by the fact that it comes with code examples in multiple languages. In addition, the authors don’t dwell too much on the philosophy of causality. While it is a fascinating topic, the opening paragraphs of the book make its goals clear:\nBy reading this book you are expressing an interest in learning about causal inference. But, as a human being, you have already mastered the fundamental concepts of causal inference. You certainly know what a causal effect is; you clearly understand the difference between association and causation; and you have used this knowledge constantly throughout your life. In fact, had you not understood these causal concepts, you would have not survived long enough to read this chapter–or even to learn to read. As a toddler you would have jumped right into the swimming pool after observing that those who did so were later able to reach the jam jar. As a teenager, you would have skied down the most dangerous slopes after observing that those who did so were more likely to win the next ski race. As a parent, you would have refused to give antibiotics to your sick child after observing that those children who took their medicines were less likely to be playing in the park the next day.\nSince you already understand the definition of causal effect and the difference between association and causation, do not expect to gain deep conceptual insights from this chapter. Rather, the purpose of this chapter is to introduce mathematical notation that formalizes the causal intuition that you already possess. Make sure that you can match your causal intuition with the mathematical notation introduced here. This notation is necessary to precisely define causal concepts, and we will use it throughout the book.\nI won’t try to summarise the technical aspects of the book – partly because I don’t fully understand it all, and partly because the book itself is already a summary of a very rich research area. However, I’m likely to go back and reread the book in the future, with the goal of applying the techniques from the book to my work. I’d also like to take Hernán’s causal inference course as a way of practising what I’ve learned from the book. For people who want a non-technical summary of the topics covered by the book, I recommend the article The c-word: Scientific euphemisms do not improve causal inference from observational data. If you’re curious about other (less practical) causality books I’ve read, check out my causal inference resource list and my two previous posts on the topic: Why you should stop worrying about deep learning and deepen your understanding of causality instead and Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions.\n","wordCount":"831","inLanguage":"en","image":"https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost.jpg","datePublished":"2018-12-24T02:37:50Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">The most practical causal inference book I’ve read (is still a draft)</h1><div class=post-meta><span title='2018-12-24 02:37:50 +0000 UTC'>December 24, 2018</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost_hu60b33a1bef2586fcaccb307cd6388d77_2433611_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost_hu60b33a1bef2586fcaccb307cd6388d77_2433611_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost_hu60b33a1bef2586fcaccb307cd6388d77_2433611_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost_hu60b33a1bef2586fcaccb307cd6388d77_2433611_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost_hu60b33a1bef2586fcaccb307cd6388d77_2433611_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost.jpg 4210w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/chicken-egg-roost.jpg alt width=4210 height=2812></figure><div class=post-content><p>I&rsquo;ve been interested in the area of causal inference in the past few years. In my opinion <a href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/>it&rsquo;s more exciting and relevant to everyday life than more hyped data science areas like deep learning</a>. However, I&rsquo;ve found it hard to apply what I&rsquo;ve learned about causal inference to my work. Now, I believe I&rsquo;ve finally found a book with practical techniques that I can use on real problems: <a href=https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ target=_blank rel=noopener><em>Causal Inference</em></a> by Miguel Hernán and Jamie Robins. It is available for free from their site, but is still in draft mode. This post is a short summary of the reasons why I think <em>Causal Inference</em> is a great practical resource.</p><p>One of the things that sets <em>Causal Inference</em> apart from other books on the topic is the background of its authors. Hernán and Robins are both epidemiologists, which means they often have to deal with data with strong limitations on sample size and feasibility of experiments. Decisions driven by causal inference in epidemiology can often make the difference between life and death of individuals. Hence, the book is full of practical examples.</p><p>The book focuses on randomised controlled trials and well-defined interventions as the basis of causal inference from both experimental and observational data. As the authors show, even with randomised experiments, the analysis often requires using observational causal inference tools due to factors like selection and measurement biases. Their insistence on well-defined interventions is particularly refreshing, as one of the things that bothers me about the writings of Judea Pearl (a prominent researcher of causal inference) is <a href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/>the vagueness of statements like <em>&ldquo;smoking causes cancer&rdquo;</em> and <em>&ldquo;mud doesn&rsquo;t cause rain&rdquo;</em></a>. The need for well-defined interventions was summarised by Hernán in the article <a href=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5207342/ target=_blank rel=noopener><em>Does water kill? A call for less casual causal inferences</em></a>.</p><p>Unlike some other resources, <em>Causal Inference</em> doesn&rsquo;t appear to be too dogmatic about the framework used for modelling causality. I&rsquo;m not an expert on where each idea originated, but it seems like the authors mix elements from the <a href=https://en.wikipedia.org/wiki/Rubin_causal_model target=_blank rel=noopener>potential outcomes framework</a> and from <a href=https://en.wikipedia.org/wiki/Structural_equation_modeling target=_blank rel=noopener>Pearl&rsquo;s graphical models</a>. They also don&rsquo;t neglect time as an important consideration in cause-and-effect relationships. In fact, the third part of the book is dedicated to the topic of time-varying treatments and effects.</p><p>The practicality of the book is also demonstrated by the fact that it comes with code examples in multiple languages. In addition, the authors don&rsquo;t dwell too much on the philosophy of causality. While it is a fascinating topic, the opening paragraphs of the book make its goals clear:</p><blockquote><p>By reading this book you are expressing an interest in learning about causal inference. But, as a human being, you have already mastered the fundamental concepts of causal inference. You certainly know what a causal effect is; you clearly understand the difference between association and causation; and you have used this knowledge constantly throughout your life. In fact, had you not understood these causal concepts, you would have not survived long enough to read this chapter–or even to learn to read. As a toddler you would have jumped right into the swimming pool after observing that those who did so were later able to reach the jam jar. As a teenager, you would have skied down the most dangerous slopes after observing that those who did so were more likely to win the next ski race. As a parent, you would have refused to give antibiotics to your sick child after observing that those children who took their medicines were less likely to be playing in the park the next day.</p><p>Since you already understand the definition of causal effect and the difference between association and causation, do not expect to gain deep conceptual insights from this chapter. Rather, the purpose of this chapter is to introduce mathematical notation that formalizes the causal intuition that you already possess. Make sure that you can match your causal intuition with the mathematical notation introduced here. This notation is necessary to precisely define causal concepts, and we will use it throughout the book.</p></blockquote><p>I won&rsquo;t try to summarise the technical aspects of the book – partly because I don&rsquo;t fully understand it all, and partly because the book itself is already a summary of a very rich research area. However, I&rsquo;m likely to go back and reread the book in the future, with the goal of applying the techniques from the book to my work. I&rsquo;d also like to take <a href=https://www.edx.org/course/causal-diagrams-draw-assumptions-harvardx-ph559x target=_blank rel=noopener>Hernán&rsquo;s causal inference course</a> as a way of practising what I&rsquo;ve learned from the book. For people who want a non-technical summary of the topics covered by the book, I recommend the article <a href=https://ajph.aphapublications.org/doi/10.2105/AJPH.2018.304337 target=_blank rel=noopener><em>The c-word: Scientific euphemisms do not improve causal inference from observational data</em></a>. If you&rsquo;re curious about other (less practical) causality books I&rsquo;ve read, check out <a href=https://yanirseroussi.com/causal-inference-resources/>my causal inference resource list</a> and my two previous posts on the topic: <a href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/><em>Why you should stop worrying about deep learning and deepen your understanding of causality instead</em></a> and <a href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/><em>Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions</em></a>.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/causal-inference/>Causal Inference</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/statistics/>Statistics</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The most practical causal inference book I’ve read (is still a draft) on x" href="https://x.com/intent/tweet/?text=The%20most%20practical%20causal%20inference%20book%20I%e2%80%99ve%20read%20%28is%20still%20a%20draft%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f12%2f24%2fthe-most-practical-causal-inference-book-ive-read-is-still-a-draft%2f&amp;hashtags=causalinference%2cdatascience%2cstatistics"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The most practical causal inference book I’ve read (is still a draft) on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f12%2f24%2fthe-most-practical-causal-inference-book-ive-read-is-still-a-draft%2f&amp;title=The%20most%20practical%20causal%20inference%20book%20I%e2%80%99ve%20read%20%28is%20still%20a%20draft%29&amp;summary=The%20most%20practical%20causal%20inference%20book%20I%e2%80%99ve%20read%20%28is%20still%20a%20draft%29&amp;source=https%3a%2f%2fyanirseroussi.com%2f2018%2f12%2f24%2fthe-most-practical-causal-inference-book-ive-read-is-still-a-draft%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The most practical causal inference book I’ve read (is still a draft) on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2018%2f12%2f24%2fthe-most-practical-causal-inference-book-ive-read-is-still-a-draft%2f&title=The%20most%20practical%20causal%20inference%20book%20I%e2%80%99ve%20read%20%28is%20still%20a%20draft%29"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The most practical causal inference book I’ve read (is still a draft) on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2018%2f12%2f24%2fthe-most-practical-causal-inference-book-ive-read-is-still-a-draft%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The most practical causal inference book I’ve read (is still a draft) on whatsapp" href="https://api.whatsapp.com/send?text=The%20most%20practical%20causal%20inference%20book%20I%e2%80%99ve%20read%20%28is%20still%20a%20draft%29%20-%20https%3a%2f%2fyanirseroussi.com%2f2018%2f12%2f24%2fthe-most-practical-causal-inference-book-ive-read-is-still-a-draft%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The most practical causal inference book I’ve read (is still a draft) on telegram" href="https://telegram.me/share/url?text=The%20most%20practical%20causal%20inference%20book%20I%e2%80%99ve%20read%20%28is%20still%20a%20draft%29&amp;url=https%3a%2f%2fyanirseroussi.com%2f2018%2f12%2f24%2fthe-most-practical-causal-inference-book-ive-read-is-still-a-draft%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The most practical causal inference book I’ve read (is still a draft) on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20most%20practical%20causal%20inference%20book%20I%e2%80%99ve%20read%20%28is%20still%20a%20draft%29&u=https%3a%2f%2fyanirseroussi.com%2f2018%2f12%2f24%2fthe-most-practical-causal-inference-book-ive-read-is-still-a-draft%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/index.html b/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/index.html
index de0058695..71f3413ca 100644
--- a/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/index.html
+++ b/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Hackers beware: Bootstrap sampling may be harmful | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="bootstrapping,data science,hackers,software engineering,statistics"><meta name=description content="Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren&rsquo;t that simple."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Hackers beware: Bootstrap sampling may be harmful"><meta property="og:description" content="Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren&rsquo;t that simple."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/"><meta property="og:image" content="https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2019-01-07T21:07:56+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs.jpg"><meta name=twitter:title content="Hackers beware: Bootstrap sampling may be harmful"><meta name=twitter:description content="Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren&rsquo;t that simple."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Hackers beware: Bootstrap sampling may be harmful","item":"https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Hackers beware: Bootstrap sampling may be harmful","name":"Hackers beware: Bootstrap sampling may be harmful","description":"Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren\u0026rsquo;t that simple.","keywords":["bootstrapping","data science","hackers","software engineering","statistics"],"articleBody":"Bootstrap sampling techniques are very appealing, as they don’t require knowing much about statistics and opaque formulas. Instead, all one needs to do is resample the given data many times, and calculate the desired statistics. Therefore, bootstrapping has been promoted as an easy way of modelling uncertainty to hackers who don’t have much statistical knowledge. For example, the main thesis of the excellent Statistics for Hackers talk by Jake VanderPlas is: “If you can write a for-loop, you can do statistics”. Similar ground was covered by Erik Bernhardsson in The Hacker’s Guide to Uncertainty Estimates, which provides more use cases for bootstrapping (with code examples). However, I’ve learned in the past few weeks that there are quite a few pitfalls in bootstrapping. Much of what I’ve learned is summarised in a paper titled What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum by Tim Hesterberg. I doubt that many hackers would be motivated to read a paper with such a title, so my goal with this post is to make some of my discoveries more accessible to a wider audience. To learn more about the issues raised in this post, it’s worth reading Hesterberg’s paper and other linked resources.\nFor quick reference, here’s a summary of the advice in this post:\nUse an accurate method for estimating confidence intervals Use enough resamples – at least 10-15K Don’t compare confidence intervals visually Ensure that the basic assumptions apply to your situation Pitfall #1: Inaccurate confidence intervals Confidence intervals are a common way of quantifying the uncertainty in an estimate of a population parameter. The percentile method is one of the simplest bootstrapping approaches for generating confidence intervals. For example, let’s say we have a data sample of size n and we want to estimate a 95% confidence interval for the population mean. We take r bootstrap resamples from the original data sample, where each resample is a sample with replacement of size n. We calculate the mean of each resample and store the means in a sorted array. We then return the 95% confidence interval as the values that fall at the 0.025r and 0.975r indices of the sorted array (i.e., the 2.5% and 97.5% percentiles). The following table shows what the first two resamples may look like for a data sample of size n=5.\nOriginal sample Resample #1 Resample #2 … Values 10 30 20 … 12 20 20 20 12 30 30 12 30 45 45 30 Mean 23.4 23.8 26 … The percentile method is nice and simple. Any programmer should be able to easily implement it in their favourite programming language, assuming they can actually program. Unfortunately, this method is just not accurate enough for small sample sizes. Quoting Hesterberg (emphasis mine):\nThe sample sizes needed for different intervals to satisfy the “reasonably accurate” (off by no more than 10% on each side) criterion are: n ≥ 101 for the bootstrap t, 220 for the skewness-adjusted t statistic, 2,235 for expanded percentile, 2,383 for percentile, 4,815 for ordinary t (which I have rounded up to 5,000 above), 5,063 for t with bootstrap standard errors and something over 8,000 for the reverse percentile method.\nIn a shorter version of the paper cited above, Hesterberg concludes that:\nIn practice, implementing some of the more accurate bootstrap methods is difficult (especially those not described here), and people should use a package rather than attempt this themselves.\nIn short, make sure you’re using an accurate method for estimating confidence intervals when dealing with sample sizes of less than a few thousand values. Using a package is a great idea, but unfortunately I don’t know of any Python bootstrapping package that is feature-complete: ARCH and scikits-bootstrap support advanced confidence interval methods but don’t support analysis of two samples of uneven sizes, while bootstrapped works with samples of uneven sizes but only supports the percentile and the reverse percentile method (which Hesterberg found to be even less accurate). If you know of any better Python packages, please let me know! (I don’t use R, but I suspect the situation is better there). Update: ARCH now supports analysis of samples of uneven sizes following an issue I reported. It seems to be the best Python bootstrapping package, so I recommend using it.\nPitfall #2: Not enough resamples Accurate bootstrap estimates require a large number of resamples. Many code snippets use 1,000 resamples, probably because it looks like a large number. However, seeming large isn’t enough. Quoting Hesterberg again:\nFor both the bootstrap and permutation tests, the number of resamples needs to be 15,000 or more, for 95% probability that simulation-based one-sided levels fall within 10% of the true values, for 95% intervals and 5% tests. I recommend r = 10,000 for routine use, and more when accuracy matters.\n[…]\nWe want decisions to depend on the data, not random variation in the Monte Carlo implementation. We used r = 500,000 in the Verizon project.\nThat’s right, half a million resamples! Accuracy mattered in the Verizon case, as the results of the analysis determined whether large penalties were paid or not. In short, use at least 10-15,000 resamples to be safe. Don’t use 1,000.\nPitfall #3: Comparison of single-sample confidence intervals Confidence intervals are commonly used to decide if the difference between two samples is statistically significant. Bootstrapping provides a straightforward way of estimating confidence intervals without making assumptions about the way the data was generated. For example, given two samples, we can obtain confidence intervals for the mean of each sample and end up with a plot like this:\nWhen looking at this plot, some people may conclude that the difference between the groups isn’t statistically significant because the confidence intervals overlap. However, overlapping confidence intervals don’t imply a lack of statistical significance because it is possible for the confidence interval of the difference between the sample means to not contain zero. Prasanna Parasurama explained why this happens in this post. While this issue isn’t unique to bootstrapping, it’s worth remembering that when comparing two groups, we need to obtain the confidence interval for the difference in the parameter we’re comparing, not compare single-sample confidence intervals.\nFor a concrete example, consider a case where we’re looking at a binary outcomes (yes/no or 1/0), which occur in coin flips or online A/B tests. Sample A consists of 2,150 zeroes and 350 ones, while sample B consists of 2,250 zeroes and 440 ones. As these are fairly large samples, we can use the bootstrap percentile method to obtain 95% confidence intervals for the mean of each sample. As the following figure shows, these intervals overlap. If we use the same method to also obtain a 95% confidence interval for the difference in means between B and A, we see that it doesn’t include zero. Therefore, we can say that the difference between B and A is statistically significant, despite the overlap between the single-sample confidence intervals.\nIt’s worth noting that when analysing binary outcomes, we can make stronger assumptions about the data rather than use bootstrapping to obtain confidence intervals. Erik Bernhardsson suggests using the Beta distribution to obtain single-sample confidence intervals, but as we’ve seen, they don’t tell us enough about the differences between samples. I suggested using a Bayesian approach in the past, which makes explicit modelling assumptions that allow us to encode our prior knowledge on the specific environment where the data was generated. For example, when running online A/B tests, we often have a ballpark figure for reasonable results, which can be used in the Bayesian A/B testing calculator I built.\nPitfall #4: Unrepresentative and dependent samples While the basic bootstrap makes no assumption about the underlying distribution of the data, it is not assumption-free. For example, when dealing with correlated data points from a time series, using the basic bootstrapping approach is wrong because it assumes that the data points are independent. Instead, a block bootstrap should be used – see the ARCH package for some implementation examples. In addition, bootstrapping doesn’t solve problems with the underlying sampling approach. For example, the data sample may not be representative of the population because of its small size, or there may be selection biases and measurement errors. No amount of bootstrapping is going to help with such issues. In general, it always helps to be aware of the data’s generation process, e.g., different considerations apply when dealing with data from online experiments versus observational studies.\nConclusion and next steps While bootstrapping is a powerful method, its initial impression of simplicity is misleading. To draw valid conclusions, it’s a good idea to use a package and be aware of considerations that are specific to the analysed data sample. However, if you’re already increasing your awareness of the data and its generation process, it may make sense to explicitly encode your assumptions in the model. This is where another hacker resource would come in handy: Probabilistic Programming \u0026 Bayesian Methods for Hackers by Cam Davidson-Pilon. Admittedly, it’s a bit longer than the average blog post or conference talk, but it is worth reading.\nGoing down the bootstrapping rabbit hole has reminded me of an important lesson: Blog posts and talks – especially ones with the word hacker in the title – may be a good starting point, but they shouldn’t be relied on for serious work. Instead, it is better to consult peer-reviewed resources and textbooks, such as the references listed in ARCH’s documentation. In my future explorations of bootstrapping and other methods, I will heed Abraham Lincoln’s timeless advice to not trust everything I read on the internet.\nUpdate (Oct 2019): I published a post summarising a talk I gave on the topic, complete with simulation code that illustrates the issues with some bootstrapping algorithms.\n","wordCount":"1625","inLanguage":"en","image":"https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs.jpg","datePublished":"2019-01-07T21:07:56Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Hackers beware: Bootstrap sampling may be harmful</h1><div class=post-meta><span title='2019-01-07 21:07:56 +0000 UTC'>January 7, 2019</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs_hu66da5e7e5a432a77b79afd3fa924437e_1490615_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs_hu66da5e7e5a432a77b79afd3fa924437e_1490615_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs_hu66da5e7e5a432a77b79afd3fa924437e_1490615_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs_hu66da5e7e5a432a77b79afd3fa924437e_1490615_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs_hu66da5e7e5a432a77b79afd3fa924437e_1490615_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs.jpg 3531w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs.jpg alt width=3531 height=1200></figure><div class=post-content><p><a href=https://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29 target=_blank rel=noopener>Bootstrap sampling techniques</a> are very appealing, as they don&rsquo;t require knowing much about statistics and opaque formulas. Instead, all one needs to do is resample the given data many times, and calculate the desired statistics. Therefore, bootstrapping has been promoted as an easy way of modelling uncertainty to hackers who don&rsquo;t have much statistical knowledge. For example, the main thesis of the excellent <a href=https://speakerdeck.com/jakevdp/statistics-for-hackers target=_blank rel=noopener><em>Statistics for Hackers</em></a> talk by Jake VanderPlas is: <em>&ldquo;If you can write a for-loop, you can do statistics&rdquo;</em>. Similar ground was covered by Erik Bernhardsson in <a href=https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html target=_blank rel=noopener><em>The Hacker&rsquo;s Guide to Uncertainty Estimates</em></a>, which provides more use cases for bootstrapping (with code examples). However, I&rsquo;ve learned in the past few weeks that there are quite a few pitfalls in bootstrapping. Much of what I&rsquo;ve learned is summarised in a paper titled <a href=https://arxiv.org/abs/1411.5279 target=_blank rel=noopener><em>What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum</em></a> by Tim Hesterberg. I doubt that many hackers would be motivated to read a paper with such a title, so my goal with this post is to make some of my discoveries more accessible to a wider audience. To learn more about the issues raised in this post, it&rsquo;s worth reading Hesterberg&rsquo;s paper and other linked resources.</p><p>For quick reference, here&rsquo;s a summary of the advice in this post:</p><ul><li>Use an accurate method for estimating confidence intervals</li><li>Use enough resamples – at least 10-15K</li><li>Don&rsquo;t compare confidence intervals visually</li><li>Ensure that the basic assumptions apply to your situation</li></ul><h2 id=pitfall-1-inaccurate-confidence-intervals>Pitfall #1: Inaccurate confidence intervals<a hidden class=anchor aria-hidden=true href=#pitfall-1-inaccurate-confidence-intervals>#</a></h2><p><a href=https://en.wikipedia.org/wiki/Confidence_interval target=_blank rel=noopener>Confidence intervals</a> are a common way of quantifying the uncertainty in an estimate of a population parameter. The percentile method is one of the simplest bootstrapping approaches for generating confidence intervals. For example, let&rsquo;s say we have a data sample of size <code>n</code> and we want to estimate a 95% confidence interval for the population mean. We take <code>r</code> bootstrap <em>resamples</em> from the original data sample, where each resample is a sample with replacement of size <code>n</code>. We calculate the mean of each resample and store the means in a sorted array. We then return the 95% confidence interval as the values that fall at the <code>0.025r</code> and <code>0.975r</code> indices of the sorted array (i.e., the 2.5% and 97.5% percentiles). The following table shows what the first two resamples may look like for a data sample of size <code>n=5</code>.</p><table><thead><tr><th></th><th>Original sample</th><th>Resample #1</th><th>Resample #2</th><th>&mldr;</th></tr></thead><tbody><tr><td><strong>Values</strong></td><td>10</td><td>30</td><td>20</td><td>&mldr;</td></tr><tr><td></td><td>12</td><td>20</td><td>20</td><td></td></tr><tr><td></td><td>20</td><td>12</td><td>30</td><td></td></tr><tr><td></td><td>30</td><td>12</td><td>30</td><td></td></tr><tr><td></td><td>45</td><td>45</td><td>30</td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td><strong>Mean</strong></td><td><em>23.4</em></td><td><em>23.8</em></td><td><em>26</em></td><td><em>&mldr;</em></td></tr></tbody></table><p>The percentile method is nice and simple. Any programmer should be able to easily implement it in their favourite programming language, assuming <a href=https://blog.codinghorror.com/why-cant-programmers-program/ target=_blank rel=noopener>they can actually program</a>. Unfortunately, <strong>this method is just not accurate enough for small sample sizes</strong>. Quoting Hesterberg (emphasis mine):</p><blockquote><p>The sample sizes needed for different intervals to satisfy the &ldquo;reasonably accurate&rdquo; (off by no more than 10% on each side) criterion are: n ≥ 101 for the bootstrap t, 220 for the skewness-adjusted t statistic, 2,235 for expanded percentile, <b style=font-weight:700>2,383 for percentile</b>, 4,815 for ordinary t (which I have rounded up to 5,000 above), 5,063 for t with bootstrap standard errors and something over 8,000 for the reverse percentile method.</p></blockquote><p>In <a href=https://storage.googleapis.com/pub-tools-public-publication-data/pdf/44859.pdf target=_blank rel=noopener>a shorter version of the paper cited above</a>, Hesterberg concludes that:</p><blockquote><p>In practice, implementing some of the more accurate bootstrap methods is difficult (especially those not described here), and people should use a package rather than attempt this themselves.</p></blockquote><p>In short, <strong>make sure you&rsquo;re using an accurate method for estimating confidence intervals when dealing with sample sizes of less than a few thousand values</strong>. Using a package is a great idea, but unfortunately I don&rsquo;t know of any Python bootstrapping package that is feature-complete: <a href=https://github.com/bashtage/arch/ target=_blank rel=noopener>ARCH</a> and <a href=https://github.com/cgevans/scikits-bootstrap/ target=_blank rel=noopener>scikits-bootstrap</a> support advanced confidence interval methods but don&rsquo;t support analysis of two samples of uneven sizes, while <a href=https://github.com/facebookincubator/bootstrapped/ target=_blank rel=noopener>bootstrapped</a> works with samples of uneven sizes but only supports the percentile and the reverse percentile method (which Hesterberg found to be even less accurate). If you know of any better Python packages, please let me know! (I don&rsquo;t use R, but I suspect the situation is better there). <strong>Update</strong>: <a href=https://github.com/bashtage/arch/releases/tag/4.8.0 target=_blank rel=noopener>ARCH now supports</a> analysis of samples of uneven sizes <a href=https://github.com/bashtage/arch/issues/260 target=_blank rel=noopener>following an issue I reported</a>. It seems to be the best Python bootstrapping package, so I recommend using it.</p><h2 id=pitfall-2-not-enough-resamples>Pitfall #2: Not enough resamples<a hidden class=anchor aria-hidden=true href=#pitfall-2-not-enough-resamples>#</a></h2><p>Accurate bootstrap estimates require a large number of resamples. Many code snippets use 1,000 resamples, probably because it looks like a large number. However, <em>seeming</em> large isn&rsquo;t enough. Quoting Hesterberg again:</p><blockquote><p>For both the bootstrap and permutation tests, the number of resamples needs to be 15,000 or more, for 95% probability that simulation-based one-sided levels fall within 10% of the true values, for 95% intervals and 5% tests. I recommend r = 10,000 for routine use, and more when accuracy matters.</p><p>[&mldr;]</p><p>We want decisions to depend on the data, not random variation in the Monte Carlo implementation. We used r = 500,000 in the Verizon project.</p></blockquote><p>That&rsquo;s right, half a million resamples! Accuracy mattered in the Verizon case, as the results of the analysis determined whether large penalties were paid or not. In short, <strong>use at least 10-15,000 resamples to be safe</strong>. Don&rsquo;t use 1,000.</p><h2 id=pitfall-3-comparison-of-single-sample-confidence-intervals>Pitfall #3: Comparison of single-sample confidence intervals<a hidden class=anchor aria-hidden=true href=#pitfall-3-comparison-of-single-sample-confidence-intervals>#</a></h2><p>Confidence intervals are commonly used to decide if the difference between two samples is statistically significant. Bootstrapping provides a straightforward way of estimating confidence intervals without making assumptions about the way the data was generated. For example, given two samples, we can obtain confidence intervals for the mean of each sample and end up with a plot like this:</p><figure><a href=overlapping-confidence-intervals.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="bootstrapping,data science,hackers,software engineering,statistics"><meta name=description content="Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren&rsquo;t that simple."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Hackers beware: Bootstrap sampling may be harmful"><meta property="og:description" content="Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren&rsquo;t that simple."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/"><meta property="og:image" content="https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2019-01-07T21:07:56+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs.jpg"><meta name=twitter:title content="Hackers beware: Bootstrap sampling may be harmful"><meta name=twitter:description content="Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren&rsquo;t that simple."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Hackers beware: Bootstrap sampling may be harmful","item":"https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Hackers beware: Bootstrap sampling may be harmful","name":"Hackers beware: Bootstrap sampling may be harmful","description":"Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren\u0026rsquo;t that simple.","keywords":["bootstrapping","data science","hackers","software engineering","statistics"],"articleBody":"Bootstrap sampling techniques are very appealing, as they don’t require knowing much about statistics and opaque formulas. Instead, all one needs to do is resample the given data many times, and calculate the desired statistics. Therefore, bootstrapping has been promoted as an easy way of modelling uncertainty to hackers who don’t have much statistical knowledge. For example, the main thesis of the excellent Statistics for Hackers talk by Jake VanderPlas is: “If you can write a for-loop, you can do statistics”. Similar ground was covered by Erik Bernhardsson in The Hacker’s Guide to Uncertainty Estimates, which provides more use cases for bootstrapping (with code examples). However, I’ve learned in the past few weeks that there are quite a few pitfalls in bootstrapping. Much of what I’ve learned is summarised in a paper titled What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum by Tim Hesterberg. I doubt that many hackers would be motivated to read a paper with such a title, so my goal with this post is to make some of my discoveries more accessible to a wider audience. To learn more about the issues raised in this post, it’s worth reading Hesterberg’s paper and other linked resources.\nFor quick reference, here’s a summary of the advice in this post:\nUse an accurate method for estimating confidence intervals Use enough resamples – at least 10-15K Don’t compare confidence intervals visually Ensure that the basic assumptions apply to your situation Pitfall #1: Inaccurate confidence intervals Confidence intervals are a common way of quantifying the uncertainty in an estimate of a population parameter. The percentile method is one of the simplest bootstrapping approaches for generating confidence intervals. For example, let’s say we have a data sample of size n and we want to estimate a 95% confidence interval for the population mean. We take r bootstrap resamples from the original data sample, where each resample is a sample with replacement of size n. We calculate the mean of each resample and store the means in a sorted array. We then return the 95% confidence interval as the values that fall at the 0.025r and 0.975r indices of the sorted array (i.e., the 2.5% and 97.5% percentiles). The following table shows what the first two resamples may look like for a data sample of size n=5.\nOriginal sample Resample #1 Resample #2 … Values 10 30 20 … 12 20 20 20 12 30 30 12 30 45 45 30 Mean 23.4 23.8 26 … The percentile method is nice and simple. Any programmer should be able to easily implement it in their favourite programming language, assuming they can actually program. Unfortunately, this method is just not accurate enough for small sample sizes. Quoting Hesterberg (emphasis mine):\nThe sample sizes needed for different intervals to satisfy the “reasonably accurate” (off by no more than 10% on each side) criterion are: n ≥ 101 for the bootstrap t, 220 for the skewness-adjusted t statistic, 2,235 for expanded percentile, 2,383 for percentile, 4,815 for ordinary t (which I have rounded up to 5,000 above), 5,063 for t with bootstrap standard errors and something over 8,000 for the reverse percentile method.\nIn a shorter version of the paper cited above, Hesterberg concludes that:\nIn practice, implementing some of the more accurate bootstrap methods is difficult (especially those not described here), and people should use a package rather than attempt this themselves.\nIn short, make sure you’re using an accurate method for estimating confidence intervals when dealing with sample sizes of less than a few thousand values. Using a package is a great idea, but unfortunately I don’t know of any Python bootstrapping package that is feature-complete: ARCH and scikits-bootstrap support advanced confidence interval methods but don’t support analysis of two samples of uneven sizes, while bootstrapped works with samples of uneven sizes but only supports the percentile and the reverse percentile method (which Hesterberg found to be even less accurate). If you know of any better Python packages, please let me know! (I don’t use R, but I suspect the situation is better there). Update: ARCH now supports analysis of samples of uneven sizes following an issue I reported. It seems to be the best Python bootstrapping package, so I recommend using it.\nPitfall #2: Not enough resamples Accurate bootstrap estimates require a large number of resamples. Many code snippets use 1,000 resamples, probably because it looks like a large number. However, seeming large isn’t enough. Quoting Hesterberg again:\nFor both the bootstrap and permutation tests, the number of resamples needs to be 15,000 or more, for 95% probability that simulation-based one-sided levels fall within 10% of the true values, for 95% intervals and 5% tests. I recommend r = 10,000 for routine use, and more when accuracy matters.\n[…]\nWe want decisions to depend on the data, not random variation in the Monte Carlo implementation. We used r = 500,000 in the Verizon project.\nThat’s right, half a million resamples! Accuracy mattered in the Verizon case, as the results of the analysis determined whether large penalties were paid or not. In short, use at least 10-15,000 resamples to be safe. Don’t use 1,000.\nPitfall #3: Comparison of single-sample confidence intervals Confidence intervals are commonly used to decide if the difference between two samples is statistically significant. Bootstrapping provides a straightforward way of estimating confidence intervals without making assumptions about the way the data was generated. For example, given two samples, we can obtain confidence intervals for the mean of each sample and end up with a plot like this:\nWhen looking at this plot, some people may conclude that the difference between the groups isn’t statistically significant because the confidence intervals overlap. However, overlapping confidence intervals don’t imply a lack of statistical significance because it is possible for the confidence interval of the difference between the sample means to not contain zero. Prasanna Parasurama explained why this happens in this post. While this issue isn’t unique to bootstrapping, it’s worth remembering that when comparing two groups, we need to obtain the confidence interval for the difference in the parameter we’re comparing, not compare single-sample confidence intervals.\nFor a concrete example, consider a case where we’re looking at a binary outcomes (yes/no or 1/0), which occur in coin flips or online A/B tests. Sample A consists of 2,150 zeroes and 350 ones, while sample B consists of 2,250 zeroes and 440 ones. As these are fairly large samples, we can use the bootstrap percentile method to obtain 95% confidence intervals for the mean of each sample. As the following figure shows, these intervals overlap. If we use the same method to also obtain a 95% confidence interval for the difference in means between B and A, we see that it doesn’t include zero. Therefore, we can say that the difference between B and A is statistically significant, despite the overlap between the single-sample confidence intervals.\nIt’s worth noting that when analysing binary outcomes, we can make stronger assumptions about the data rather than use bootstrapping to obtain confidence intervals. Erik Bernhardsson suggests using the Beta distribution to obtain single-sample confidence intervals, but as we’ve seen, they don’t tell us enough about the differences between samples. I suggested using a Bayesian approach in the past, which makes explicit modelling assumptions that allow us to encode our prior knowledge on the specific environment where the data was generated. For example, when running online A/B tests, we often have a ballpark figure for reasonable results, which can be used in the Bayesian A/B testing calculator I built.\nPitfall #4: Unrepresentative and dependent samples While the basic bootstrap makes no assumption about the underlying distribution of the data, it is not assumption-free. For example, when dealing with correlated data points from a time series, using the basic bootstrapping approach is wrong because it assumes that the data points are independent. Instead, a block bootstrap should be used – see the ARCH package for some implementation examples. In addition, bootstrapping doesn’t solve problems with the underlying sampling approach. For example, the data sample may not be representative of the population because of its small size, or there may be selection biases and measurement errors. No amount of bootstrapping is going to help with such issues. In general, it always helps to be aware of the data’s generation process, e.g., different considerations apply when dealing with data from online experiments versus observational studies.\nConclusion and next steps While bootstrapping is a powerful method, its initial impression of simplicity is misleading. To draw valid conclusions, it’s a good idea to use a package and be aware of considerations that are specific to the analysed data sample. However, if you’re already increasing your awareness of the data and its generation process, it may make sense to explicitly encode your assumptions in the model. This is where another hacker resource would come in handy: Probabilistic Programming \u0026 Bayesian Methods for Hackers by Cam Davidson-Pilon. Admittedly, it’s a bit longer than the average blog post or conference talk, but it is worth reading.\nGoing down the bootstrapping rabbit hole has reminded me of an important lesson: Blog posts and talks – especially ones with the word hacker in the title – may be a good starting point, but they shouldn’t be relied on for serious work. Instead, it is better to consult peer-reviewed resources and textbooks, such as the references listed in ARCH’s documentation. In my future explorations of bootstrapping and other methods, I will heed Abraham Lincoln’s timeless advice to not trust everything I read on the internet.\nUpdate (Oct 2019): I published a post summarising a talk I gave on the topic, complete with simulation code that illustrates the issues with some bootstrapping algorithms.\n","wordCount":"1625","inLanguage":"en","image":"https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs.jpg","datePublished":"2019-01-07T21:07:56Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Hackers beware: Bootstrap sampling may be harmful</h1><div class=post-meta><span title='2019-01-07 21:07:56 +0000 UTC'>January 7, 2019</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs_hu66da5e7e5a432a77b79afd3fa924437e_1490615_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs_hu66da5e7e5a432a77b79afd3fa924437e_1490615_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs_hu66da5e7e5a432a77b79afd3fa924437e_1490615_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs_hu66da5e7e5a432a77b79afd3fa924437e_1490615_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs_hu66da5e7e5a432a77b79afd3fa924437e_1490615_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs.jpg 3531w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/warning-signs.jpg alt width=3531 height=1200></figure><div class=post-content><p><a href=https://en.wikipedia.org/wiki/Bootstrapping_%28statistics%29 target=_blank rel=noopener>Bootstrap sampling techniques</a> are very appealing, as they don&rsquo;t require knowing much about statistics and opaque formulas. Instead, all one needs to do is resample the given data many times, and calculate the desired statistics. Therefore, bootstrapping has been promoted as an easy way of modelling uncertainty to hackers who don&rsquo;t have much statistical knowledge. For example, the main thesis of the excellent <a href=https://speakerdeck.com/jakevdp/statistics-for-hackers target=_blank rel=noopener><em>Statistics for Hackers</em></a> talk by Jake VanderPlas is: <em>&ldquo;If you can write a for-loop, you can do statistics&rdquo;</em>. Similar ground was covered by Erik Bernhardsson in <a href=https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html target=_blank rel=noopener><em>The Hacker&rsquo;s Guide to Uncertainty Estimates</em></a>, which provides more use cases for bootstrapping (with code examples). However, I&rsquo;ve learned in the past few weeks that there are quite a few pitfalls in bootstrapping. Much of what I&rsquo;ve learned is summarised in a paper titled <a href=https://arxiv.org/abs/1411.5279 target=_blank rel=noopener><em>What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum</em></a> by Tim Hesterberg. I doubt that many hackers would be motivated to read a paper with such a title, so my goal with this post is to make some of my discoveries more accessible to a wider audience. To learn more about the issues raised in this post, it&rsquo;s worth reading Hesterberg&rsquo;s paper and other linked resources.</p><p>For quick reference, here&rsquo;s a summary of the advice in this post:</p><ul><li>Use an accurate method for estimating confidence intervals</li><li>Use enough resamples – at least 10-15K</li><li>Don&rsquo;t compare confidence intervals visually</li><li>Ensure that the basic assumptions apply to your situation</li></ul><h2 id=pitfall-1-inaccurate-confidence-intervals>Pitfall #1: Inaccurate confidence intervals<a hidden class=anchor aria-hidden=true href=#pitfall-1-inaccurate-confidence-intervals>#</a></h2><p><a href=https://en.wikipedia.org/wiki/Confidence_interval target=_blank rel=noopener>Confidence intervals</a> are a common way of quantifying the uncertainty in an estimate of a population parameter. The percentile method is one of the simplest bootstrapping approaches for generating confidence intervals. For example, let&rsquo;s say we have a data sample of size <code>n</code> and we want to estimate a 95% confidence interval for the population mean. We take <code>r</code> bootstrap <em>resamples</em> from the original data sample, where each resample is a sample with replacement of size <code>n</code>. We calculate the mean of each resample and store the means in a sorted array. We then return the 95% confidence interval as the values that fall at the <code>0.025r</code> and <code>0.975r</code> indices of the sorted array (i.e., the 2.5% and 97.5% percentiles). The following table shows what the first two resamples may look like for a data sample of size <code>n=5</code>.</p><table><thead><tr><th></th><th>Original sample</th><th>Resample #1</th><th>Resample #2</th><th>&mldr;</th></tr></thead><tbody><tr><td><strong>Values</strong></td><td>10</td><td>30</td><td>20</td><td>&mldr;</td></tr><tr><td></td><td>12</td><td>20</td><td>20</td><td></td></tr><tr><td></td><td>20</td><td>12</td><td>30</td><td></td></tr><tr><td></td><td>30</td><td>12</td><td>30</td><td></td></tr><tr><td></td><td>45</td><td>45</td><td>30</td><td></td></tr><tr><td></td><td></td><td></td><td></td><td></td></tr><tr><td><strong>Mean</strong></td><td><em>23.4</em></td><td><em>23.8</em></td><td><em>26</em></td><td><em>&mldr;</em></td></tr></tbody></table><p>The percentile method is nice and simple. Any programmer should be able to easily implement it in their favourite programming language, assuming <a href=https://blog.codinghorror.com/why-cant-programmers-program/ target=_blank rel=noopener>they can actually program</a>. Unfortunately, <strong>this method is just not accurate enough for small sample sizes</strong>. Quoting Hesterberg (emphasis mine):</p><blockquote><p>The sample sizes needed for different intervals to satisfy the &ldquo;reasonably accurate&rdquo; (off by no more than 10% on each side) criterion are: n ≥ 101 for the bootstrap t, 220 for the skewness-adjusted t statistic, 2,235 for expanded percentile, <b style=font-weight:700>2,383 for percentile</b>, 4,815 for ordinary t (which I have rounded up to 5,000 above), 5,063 for t with bootstrap standard errors and something over 8,000 for the reverse percentile method.</p></blockquote><p>In <a href=https://storage.googleapis.com/pub-tools-public-publication-data/pdf/44859.pdf target=_blank rel=noopener>a shorter version of the paper cited above</a>, Hesterberg concludes that:</p><blockquote><p>In practice, implementing some of the more accurate bootstrap methods is difficult (especially those not described here), and people should use a package rather than attempt this themselves.</p></blockquote><p>In short, <strong>make sure you&rsquo;re using an accurate method for estimating confidence intervals when dealing with sample sizes of less than a few thousand values</strong>. Using a package is a great idea, but unfortunately I don&rsquo;t know of any Python bootstrapping package that is feature-complete: <a href=https://github.com/bashtage/arch/ target=_blank rel=noopener>ARCH</a> and <a href=https://github.com/cgevans/scikits-bootstrap/ target=_blank rel=noopener>scikits-bootstrap</a> support advanced confidence interval methods but don&rsquo;t support analysis of two samples of uneven sizes, while <a href=https://github.com/facebookincubator/bootstrapped/ target=_blank rel=noopener>bootstrapped</a> works with samples of uneven sizes but only supports the percentile and the reverse percentile method (which Hesterberg found to be even less accurate). If you know of any better Python packages, please let me know! (I don&rsquo;t use R, but I suspect the situation is better there). <strong>Update</strong>: <a href=https://github.com/bashtage/arch/releases/tag/4.8.0 target=_blank rel=noopener>ARCH now supports</a> analysis of samples of uneven sizes <a href=https://github.com/bashtage/arch/issues/260 target=_blank rel=noopener>following an issue I reported</a>. It seems to be the best Python bootstrapping package, so I recommend using it.</p><h2 id=pitfall-2-not-enough-resamples>Pitfall #2: Not enough resamples<a hidden class=anchor aria-hidden=true href=#pitfall-2-not-enough-resamples>#</a></h2><p>Accurate bootstrap estimates require a large number of resamples. Many code snippets use 1,000 resamples, probably because it looks like a large number. However, <em>seeming</em> large isn&rsquo;t enough. Quoting Hesterberg again:</p><blockquote><p>For both the bootstrap and permutation tests, the number of resamples needs to be 15,000 or more, for 95% probability that simulation-based one-sided levels fall within 10% of the true values, for 95% intervals and 5% tests. I recommend r = 10,000 for routine use, and more when accuracy matters.</p><p>[&mldr;]</p><p>We want decisions to depend on the data, not random variation in the Monte Carlo implementation. We used r = 500,000 in the Verizon project.</p></blockquote><p>That&rsquo;s right, half a million resamples! Accuracy mattered in the Verizon case, as the results of the analysis determined whether large penalties were paid or not. In short, <strong>use at least 10-15,000 resamples to be safe</strong>. Don&rsquo;t use 1,000.</p><h2 id=pitfall-3-comparison-of-single-sample-confidence-intervals>Pitfall #3: Comparison of single-sample confidence intervals<a hidden class=anchor aria-hidden=true href=#pitfall-3-comparison-of-single-sample-confidence-intervals>#</a></h2><p>Confidence intervals are commonly used to decide if the difference between two samples is statistically significant. Bootstrapping provides a straightforward way of estimating confidence intervals without making assumptions about the way the data was generated. For example, given two samples, we can obtain confidence intervals for the mean of each sample and end up with a plot like this:</p><figure><a href=overlapping-confidence-intervals.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/overlapping-confidence-intervals_hue7fc18354688a60dc90db601b41630cc_12060_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/overlapping-confidence-intervals_hue7fc18354688a60dc90db601b41630cc_12060_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/overlapping-confidence-intervals_hue7fc18354688a60dc90db601b41630cc_12060_720x0_resize_box_3.png 720w,
diff --git a/2019/10/06/bootstrapping-the-right-way/index.html b/2019/10/06/bootstrapping-the-right-way/index.html
index 14d310379..e18b2f568 100644
--- a/2019/10/06/bootstrapping-the-right-way/index.html
+++ b/2019/10/06/bootstrapping-the-right-way/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Bootstrapping the right way? | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="analytics,data science,software engineering,statistics"><meta name=description content="Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Bootstrapping the right way?"><meta property="og:description" content="Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/"><meta property="og:image" content="https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2019-10-06T06:48:07+00:00"><meta property="article:modified_time" content="2024-05-06T16:35:22+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals.png"><meta name=twitter:title content="Bootstrapping the right way?"><meta name=twitter:description content="Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Bootstrapping the right way?","item":"https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Bootstrapping the right way?","name":"Bootstrapping the right way?","description":"Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.","keywords":["analytics","data science","software engineering","statistics"],"articleBody":"Bootstrapping the right way is a talk I gave earlier this year at the YOW! Data conference in Sydney. You can now watch the video of the talk and have a look through the slides. The content of the talk is similar to a post I published on bootstrapping pitfalls, with some additional simulations.\nThe main takeaways shared in the talk are:\nDon’t compare single-sample confidence intervals by eye Use enough resamples (15K?) Use a solid bootstrapping package (e.g., Python ARCH) Use the right bootstrap for the job Consider going parametric Bayesian Test all the things Testing all the things typically requires writing code, which I did for the talk. You can browse through it in this notebook. The most interesting findings from my tests are summarised by the following figure.\nThe figure shows how the accuracy of confidence interval estimation varies by algorithm, sample size, and the number of bootstrapping resamples on a synthetic revenue dataset. This sort of dataset may occur in freemium scenarios, where several product variations are offered at a few price tiers, including a price of zero (i.e., free). In all cases, the dashed line denotes the requested confidence level of 95%, i.e., the true difference in means between the two revenue distributions should be inside the confidence interval in approximately 95% of the simulations for it to be accurate. Unfortunately, it is clear that both the percentile and BCa algorithms perform poorly on the simulated data. Even with a sample size of 10K, they both yield “95%” confidence intervals that contain the true difference in means less than 90% of the time, i.e., the intervals are too narrow. By contrast, the studentized algorithm gets much closer to the requested confidence level, but this comes at the price of considerably longer runtime due to the need for nested bootstrapping.\nNote that the results presented in the talk are slightly different from the figure above. The difference is due to a small bug in the simulation code: I used a constant random seed for all the bootstrapping simulation iterations (every iteration still contained different data). This has led to the surprising finding that accuracy with 10,000 resamples was lower than with 1,000 resamples. I attributed that finding to dataset quirks, and noted that my results may not generalise to all cases. Indeed, I recently ran a similar set of experiments on different data as part of my work at Automattic, and found that the studentized algorithm accuracy wasn’t as impressive as the results shown here.\nIn addition to synthetic data, the experiments I ran at Automattic included an implementation of an idea by my colleague, Demet Dagdelen: Test accuracy on samples from the full population for a given period (e.g., all sales over a calendar year). In such cases, the full population is well-defined. Therefore, we know the value of the “true” parameters, and we can run the same simulations as on synthetic data. While I can’t share that data, I can say that all algorithms performed much worse on real data than on simulated data. Therefore, we decided to follow the penultimate takeaway and use a parametric Bayesian approach for modelling our data. We may share insights from that line of work on data.blog in the future. In the meantime, comments are very welcome!\nUpdate: You can find more accurate simulations in this post.\n","wordCount":"559","inLanguage":"en","image":"https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals.png","datePublished":"2019-10-06T06:48:07Z","dateModified":"2024-05-06T16:35:22+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Bootstrapping the right way?</h1><div class=post-meta><span title='2019-10-06 06:48:07 +0000 UTC'>October 6, 2019</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals_hua7f3e259e998045c935f50c75e8eb77d_43359_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals_hua7f3e259e998045c935f50c75e8eb77d_43359_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals_hua7f3e259e998045c935f50c75e8eb77d_43359_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals_hua7f3e259e998045c935f50c75e8eb77d_43359_1080x0_resize_box_3.png 1080w ,https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals_hua7f3e259e998045c935f50c75e8eb77d_43359_1500x0_resize_box_3.png 1500w ,https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals.png 1765w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals.png alt width=1765 height=666></figure><div class=post-content><p><em>Bootstrapping the right way</em> is a talk I gave earlier this year at the YOW! Data conference in Sydney. You can now watch the video of the talk and have a look through <a href=https://yanirs.github.io/talks/bootstrapping-the-right-way/ target=_blank rel=noopener>the slides</a>. The content of the talk is similar to <a href=https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/>a post I published on bootstrapping pitfalls</a>, with some additional simulations.</p><p><div style=position:relative;padding-bottom:56.25%;height:0;overflow:hidden><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen loading=eager referrerpolicy=strict-origin-when-cross-origin src="https://www.youtube.com/embed/9SwvIqEQXP0?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style=position:absolute;top:0;left:0;width:100%;height:100%;border:0 title="YouTube video"></iframe></div></p><p>The main takeaways shared in the talk are:</p><ul><li>Don&rsquo;t compare single-sample confidence intervals by eye</li><li>Use enough resamples (15K?)</li><li>Use a solid bootstrapping package (e.g., <a href=https://arch.readthedocs.io/ target=_blank rel=noopener>Python ARCH</a>)</li><li>Use the right bootstrap for the job</li><li>Consider going parametric Bayesian</li><li>Test all the things</li></ul><p>Testing all the things typically requires writing code, which I did for the talk. You can browse through it in <a href=https://github.com/yanirs/yanirs.github.io/blob/master/talks/bootstrapping-the-right-way/notebook.ipynb target=_blank rel=noopener>this notebook</a>. The most interesting findings from my tests are summarised by the following figure.</p><figure><a href=revenue-confidence-intervals.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="analytics,data science,software engineering,statistics"><meta name=description content="Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Bootstrapping the right way?"><meta property="og:description" content="Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/"><meta property="og:image" content="https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2019-10-06T06:48:07+00:00"><meta property="article:modified_time" content="2024-05-06T16:35:22+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals.png"><meta name=twitter:title content="Bootstrapping the right way?"><meta name=twitter:description content="Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Bootstrapping the right way?","item":"https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Bootstrapping the right way?","name":"Bootstrapping the right way?","description":"Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.","keywords":["analytics","data science","software engineering","statistics"],"articleBody":"Bootstrapping the right way is a talk I gave earlier this year at the YOW! Data conference in Sydney. You can now watch the video of the talk and have a look through the slides. The content of the talk is similar to a post I published on bootstrapping pitfalls, with some additional simulations.\nThe main takeaways shared in the talk are:\nDon’t compare single-sample confidence intervals by eye Use enough resamples (15K?) Use a solid bootstrapping package (e.g., Python ARCH) Use the right bootstrap for the job Consider going parametric Bayesian Test all the things Testing all the things typically requires writing code, which I did for the talk. You can browse through it in this notebook. The most interesting findings from my tests are summarised by the following figure.\nThe figure shows how the accuracy of confidence interval estimation varies by algorithm, sample size, and the number of bootstrapping resamples on a synthetic revenue dataset. This sort of dataset may occur in freemium scenarios, where several product variations are offered at a few price tiers, including a price of zero (i.e., free). In all cases, the dashed line denotes the requested confidence level of 95%, i.e., the true difference in means between the two revenue distributions should be inside the confidence interval in approximately 95% of the simulations for it to be accurate. Unfortunately, it is clear that both the percentile and BCa algorithms perform poorly on the simulated data. Even with a sample size of 10K, they both yield “95%” confidence intervals that contain the true difference in means less than 90% of the time, i.e., the intervals are too narrow. By contrast, the studentized algorithm gets much closer to the requested confidence level, but this comes at the price of considerably longer runtime due to the need for nested bootstrapping.\nNote that the results presented in the talk are slightly different from the figure above. The difference is due to a small bug in the simulation code: I used a constant random seed for all the bootstrapping simulation iterations (every iteration still contained different data). This has led to the surprising finding that accuracy with 10,000 resamples was lower than with 1,000 resamples. I attributed that finding to dataset quirks, and noted that my results may not generalise to all cases. Indeed, I recently ran a similar set of experiments on different data as part of my work at Automattic, and found that the studentized algorithm accuracy wasn’t as impressive as the results shown here.\nIn addition to synthetic data, the experiments I ran at Automattic included an implementation of an idea by my colleague, Demet Dagdelen: Test accuracy on samples from the full population for a given period (e.g., all sales over a calendar year). In such cases, the full population is well-defined. Therefore, we know the value of the “true” parameters, and we can run the same simulations as on synthetic data. While I can’t share that data, I can say that all algorithms performed much worse on real data than on simulated data. Therefore, we decided to follow the penultimate takeaway and use a parametric Bayesian approach for modelling our data. We may share insights from that line of work on data.blog in the future. In the meantime, comments are very welcome!\nUpdate: You can find more accurate simulations in this post.\n","wordCount":"559","inLanguage":"en","image":"https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals.png","datePublished":"2019-10-06T06:48:07Z","dateModified":"2024-05-06T16:35:22+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Bootstrapping the right way?</h1><div class=post-meta><span title='2019-10-06 06:48:07 +0000 UTC'>October 6, 2019</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals_hua7f3e259e998045c935f50c75e8eb77d_43359_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals_hua7f3e259e998045c935f50c75e8eb77d_43359_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals_hua7f3e259e998045c935f50c75e8eb77d_43359_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals_hua7f3e259e998045c935f50c75e8eb77d_43359_1080x0_resize_box_3.png 1080w ,https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals_hua7f3e259e998045c935f50c75e8eb77d_43359_1500x0_resize_box_3.png 1500w ,https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals.png 1765w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals.png alt width=1765 height=666></figure><div class=post-content><p><em>Bootstrapping the right way</em> is a talk I gave earlier this year at the YOW! Data conference in Sydney. You can now watch the video of the talk and have a look through <a href=https://yanirs.github.io/talks/bootstrapping-the-right-way/ target=_blank rel=noopener>the slides</a>. The content of the talk is similar to <a href=https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/>a post I published on bootstrapping pitfalls</a>, with some additional simulations.</p><p><div style=position:relative;padding-bottom:56.25%;height:0;overflow:hidden><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen loading=eager referrerpolicy=strict-origin-when-cross-origin src="https://www.youtube.com/embed/9SwvIqEQXP0?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style=position:absolute;top:0;left:0;width:100%;height:100%;border:0 title="YouTube video"></iframe></div></p><p>The main takeaways shared in the talk are:</p><ul><li>Don&rsquo;t compare single-sample confidence intervals by eye</li><li>Use enough resamples (15K?)</li><li>Use a solid bootstrapping package (e.g., <a href=https://arch.readthedocs.io/ target=_blank rel=noopener>Python ARCH</a>)</li><li>Use the right bootstrap for the job</li><li>Consider going parametric Bayesian</li><li>Test all the things</li></ul><p>Testing all the things typically requires writing code, which I did for the talk. You can browse through it in <a href=https://github.com/yanirs/yanirs.github.io/blob/master/talks/bootstrapping-the-right-way/notebook.ipynb target=_blank rel=noopener>this notebook</a>. The most interesting findings from my tests are summarised by the following figure.</p><figure><a href=revenue-confidence-intervals.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals_hua7f3e259e998045c935f50c75e8eb77d_43359_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals_hua7f3e259e998045c935f50c75e8eb77d_43359_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/revenue-confidence-intervals_hua7f3e259e998045c935f50c75e8eb77d_43359_720x0_resize_box_3.png 720w,
diff --git a/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/index.html b/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/index.html
index c5c0c742b..849404274 100644
--- a/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/index.html
+++ b/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>A day in the life of a remote data scientist | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="Automattic,career,data science,remote work"><meta name=description content="Video of a talk I gave on remote data science work at the Data Science Sydney meetup."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="A day in the life of a remote data scientist"><meta property="og:description" content="Video of a talk I gave on remote data science work at the Data Science Sydney meetup."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/"><meta property="og:image" content="https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2019-12-11T22:06:19+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe.jpg"><meta name=twitter:title content="A day in the life of a remote data scientist"><meta name=twitter:description content="Video of a talk I gave on remote data science work at the Data Science Sydney meetup."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"A day in the life of a remote data scientist","item":"https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"A day in the life of a remote data scientist","name":"A day in the life of a remote data scientist","description":"Video of a talk I gave on remote data science work at the Data Science Sydney meetup.","keywords":["Automattic","career","data science","remote work"],"articleBody":"Earlier this year, I gave a talk titled A Day in the Life of a Remote Data Scientist at the Data Science Sydney meetup. The talk covered similar ground to a post I published on remote data science work, with additional details on my daily schedule and projects, some gifs and Sydney jokes, heckling by the audience, and a Q\u0026A session. I managed to watch it a few months ago without cringing too much, so it’s about time to post it here. The slides are on my GitHub, as is my list of established remote companies, which you may find useful if you want to join the remote work fun.\n","wordCount":"110","inLanguage":"en","image":"https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe.jpg","datePublished":"2019-12-11T22:06:19Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">A day in the life of a remote data scientist</h1><div class=post-meta><span title='2019-12-11 22:06:19 +0000 UTC'>December 11, 2019</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe_hu3d03a01dcc18bc5be0e67db3d8d209a6_1872808_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe_hu3d03a01dcc18bc5be0e67db3d8d209a6_1872808_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe_hu3d03a01dcc18bc5be0e67db3d8d209a6_1872808_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe_hu3d03a01dcc18bc5be0e67db3d8d209a6_1872808_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe_hu3d03a01dcc18bc5be0e67db3d8d209a6_1872808_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe.jpg 4989w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe.jpg alt width=4989 height=3326></figure><div class=post-content><p>Earlier this year, I gave a talk titled <em>A Day in the Life of a Remote Data Scientist</em> at <a href=https://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>the Data Science Sydney meetup</a>. The talk covered similar ground to <a href=https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/>a post I published on remote data science work</a>, with additional details on my daily schedule and projects, some gifs and Sydney jokes, heckling by the audience, and a Q&amp;A session. I managed to watch it a few months ago without cringing too much, so it&rsquo;s about time to post it here. <a href=https://yanirs.github.io/talks/remote-data-scientist/ target=_blank rel=noopener>The slides are on my GitHub</a>, as is <a href=https://github.com/yanirs/established-remote/ target=_blank rel=noopener>my list of established remote companies</a>, which you may find useful if you want to join the remote work fun.</p><p><div style=position:relative;padding-bottom:56.25%;height:0;overflow:hidden><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen loading=eager referrerpolicy=strict-origin-when-cross-origin src="https://www.youtube.com/embed/5qbVEEtgWcY?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style=position:absolute;top:0;left:0;width:100%;height:100%;border:0 title="YouTube video"></iframe></div></p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/automattic/>Automattic</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/remote-work/>Remote Work</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share A day in the life of a remote data scientist on x" href="https://x.com/intent/tweet/?text=A%20day%20in%20the%20life%20of%20a%20remote%20data%20scientist&amp;url=https%3a%2f%2fyanirseroussi.com%2f2019%2f12%2f12%2fa-day-in-the-life-of-a-remote-data-scientist%2f&amp;hashtags=Automattic%2ccareer%2cdatascience%2cremotework"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share A day in the life of a remote data scientist on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2019%2f12%2f12%2fa-day-in-the-life-of-a-remote-data-scientist%2f&amp;title=A%20day%20in%20the%20life%20of%20a%20remote%20data%20scientist&amp;summary=A%20day%20in%20the%20life%20of%20a%20remote%20data%20scientist&amp;source=https%3a%2f%2fyanirseroussi.com%2f2019%2f12%2f12%2fa-day-in-the-life-of-a-remote-data-scientist%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share A day in the life of a remote data scientist on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2019%2f12%2f12%2fa-day-in-the-life-of-a-remote-data-scientist%2f&title=A%20day%20in%20the%20life%20of%20a%20remote%20data%20scientist"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share A day in the life of a remote data scientist on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2019%2f12%2f12%2fa-day-in-the-life-of-a-remote-data-scientist%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share A day in the life of a remote data scientist on whatsapp" href="https://api.whatsapp.com/send?text=A%20day%20in%20the%20life%20of%20a%20remote%20data%20scientist%20-%20https%3a%2f%2fyanirseroussi.com%2f2019%2f12%2f12%2fa-day-in-the-life-of-a-remote-data-scientist%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share A day in the life of a remote data scientist on telegram" href="https://telegram.me/share/url?text=A%20day%20in%20the%20life%20of%20a%20remote%20data%20scientist&amp;url=https%3a%2f%2fyanirseroussi.com%2f2019%2f12%2f12%2fa-day-in-the-life-of-a-remote-data-scientist%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share A day in the life of a remote data scientist on ycombinator" href="https://news.ycombinator.com/submitlink?t=A%20day%20in%20the%20life%20of%20a%20remote%20data%20scientist&u=https%3a%2f%2fyanirseroussi.com%2f2019%2f12%2f12%2fa-day-in-the-life-of-a-remote-data-scientist%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="Automattic,career,data science,remote work"><meta name=description content="Video of a talk I gave on remote data science work at the Data Science Sydney meetup."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="A day in the life of a remote data scientist"><meta property="og:description" content="Video of a talk I gave on remote data science work at the Data Science Sydney meetup."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/"><meta property="og:image" content="https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2019-12-11T22:06:19+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe.jpg"><meta name=twitter:title content="A day in the life of a remote data scientist"><meta name=twitter:description content="Video of a talk I gave on remote data science work at the Data Science Sydney meetup."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"A day in the life of a remote data scientist","item":"https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"A day in the life of a remote data scientist","name":"A day in the life of a remote data scientist","description":"Video of a talk I gave on remote data science work at the Data Science Sydney meetup.","keywords":["Automattic","career","data science","remote work"],"articleBody":"Earlier this year, I gave a talk titled A Day in the Life of a Remote Data Scientist at the Data Science Sydney meetup. The talk covered similar ground to a post I published on remote data science work, with additional details on my daily schedule and projects, some gifs and Sydney jokes, heckling by the audience, and a Q\u0026A session. I managed to watch it a few months ago without cringing too much, so it’s about time to post it here. The slides are on my GitHub, as is my list of established remote companies, which you may find useful if you want to join the remote work fun.\n","wordCount":"110","inLanguage":"en","image":"https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe.jpg","datePublished":"2019-12-11T22:06:19Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">A day in the life of a remote data scientist</h1><div class=post-meta><span title='2019-12-11 22:06:19 +0000 UTC'>December 11, 2019</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe_hu3d03a01dcc18bc5be0e67db3d8d209a6_1872808_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe_hu3d03a01dcc18bc5be0e67db3d8d209a6_1872808_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe_hu3d03a01dcc18bc5be0e67db3d8d209a6_1872808_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe_hu3d03a01dcc18bc5be0e67db3d8d209a6_1872808_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe_hu3d03a01dcc18bc5be0e67db3d8d209a6_1872808_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe.jpg 4989w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/remote-person-tossing-globe.jpg alt width=4989 height=3326></figure><div class=post-content><p>Earlier this year, I gave a talk titled <em>A Day in the Life of a Remote Data Scientist</em> at <a href=https://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>the Data Science Sydney meetup</a>. The talk covered similar ground to <a href=https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/>a post I published on remote data science work</a>, with additional details on my daily schedule and projects, some gifs and Sydney jokes, heckling by the audience, and a Q&amp;A session. I managed to watch it a few months ago without cringing too much, so it&rsquo;s about time to post it here. <a href=https://yanirs.github.io/talks/remote-data-scientist/ target=_blank rel=noopener>The slides are on my GitHub</a>, as is <a href=https://github.com/yanirs/established-remote/ target=_blank rel=noopener>my list of established remote companies</a>, which you may find useful if you want to join the remote work fun.</p><p><div style=position:relative;padding-bottom:56.25%;height:0;overflow:hidden><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen loading=eager referrerpolicy=strict-origin-when-cross-origin src="https://www.youtube.com/embed/5qbVEEtgWcY?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style=position:absolute;top:0;left:0;width:100%;height:100%;border:0 title="YouTube video"></iframe></div></p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/automattic/>Automattic</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/remote-work/>Remote Work</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share A day in the life of a remote data scientist on x" href="https://x.com/intent/tweet/?text=A%20day%20in%20the%20life%20of%20a%20remote%20data%20scientist&amp;url=https%3a%2f%2fyanirseroussi.com%2f2019%2f12%2f12%2fa-day-in-the-life-of-a-remote-data-scientist%2f&amp;hashtags=Automattic%2ccareer%2cdatascience%2cremotework"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share A day in the life of a remote data scientist on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2019%2f12%2f12%2fa-day-in-the-life-of-a-remote-data-scientist%2f&amp;title=A%20day%20in%20the%20life%20of%20a%20remote%20data%20scientist&amp;summary=A%20day%20in%20the%20life%20of%20a%20remote%20data%20scientist&amp;source=https%3a%2f%2fyanirseroussi.com%2f2019%2f12%2f12%2fa-day-in-the-life-of-a-remote-data-scientist%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share A day in the life of a remote data scientist on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2019%2f12%2f12%2fa-day-in-the-life-of-a-remote-data-scientist%2f&title=A%20day%20in%20the%20life%20of%20a%20remote%20data%20scientist"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share A day in the life of a remote data scientist on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2019%2f12%2f12%2fa-day-in-the-life-of-a-remote-data-scientist%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share A day in the life of a remote data scientist on whatsapp" href="https://api.whatsapp.com/send?text=A%20day%20in%20the%20life%20of%20a%20remote%20data%20scientist%20-%20https%3a%2f%2fyanirseroussi.com%2f2019%2f12%2f12%2fa-day-in-the-life-of-a-remote-data-scientist%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share A day in the life of a remote data scientist on telegram" href="https://telegram.me/share/url?text=A%20day%20in%20the%20life%20of%20a%20remote%20data%20scientist&amp;url=https%3a%2f%2fyanirseroussi.com%2f2019%2f12%2f12%2fa-day-in-the-life-of-a-remote-data-scientist%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share A day in the life of a remote data scientist on ycombinator" href="https://news.ycombinator.com/submitlink?t=A%20day%20in%20the%20life%20of%20a%20remote%20data%20scientist&u=https%3a%2f%2fyanirseroussi.com%2f2019%2f12%2f12%2fa-day-in-the-life-of-a-remote-data-scientist%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2020/01/11/software-commodities-are-eating-interesting-data-science-work/index.html b/2020/01/11/software-commodities-are-eating-interesting-data-science-work/index.html
index 0b7852aae..07d6c1630 100644
--- a/2020/01/11/software-commodities-are-eating-interesting-data-science-work/index.html
+++ b/2020/01/11/software-commodities-are-eating-interesting-data-science-work/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Software commodities are eating interesting data science work | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,career,data science,software engineering"><meta name=description content="Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Software commodities are eating interesting data science work"><meta property="og:description" content="Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?"><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/"><meta property="og:image" content="https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2020-01-11T09:22:35+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman.png"><meta name=twitter:title content="Software commodities are eating interesting data science work"><meta name=twitter:description content="Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?"><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Software commodities are eating interesting data science work","item":"https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Software commodities are eating interesting data science work","name":"Software commodities are eating interesting data science work","description":"Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?","keywords":["business","career","data science","software engineering"],"articleBody":" The passage of time makes wizards of us all. Today, any dullard can make bells ring across the ocean by tapping out phone numbers, cause inanimate toys to march by barking an order, or activate remote devices by touching a wireless screen. Thomas Edison couldn’t have managed any of this at his peak—and shortly before his time, such powers would have been considered the unique realm of God.\nRob Reid After On Being a data scientist can sometimes feel like a race against software innovations. Every interesting and useful problem is bound to become a software commodity. My story seems to reflect that: From my first steps in sentiment analysis and topic modelling, through building recommender systems while dabbling in Kaggle competitions and deep learning a few years ago, and to my present-day interest in causal inference. What can one do to remain relevant in such an environment? Read this post to find out.\nHighlights from my past When I started my PhD in 2009, the plan was to work on sentiment analysis of opinion polls. This got me into applied machine learning using Java and Weka, with which I made some modest contributions to the field. Today, researching sentiment analysis would feel somewhat pointless, given the plethora of sentiment analysis services. Sentiment analysis is a commodity – using it in practice is a software engineering problem.\nMoving forward in my PhD, I got into topic modelling. I learned about Bayesian statistics and conjugate priors. I went through the arduous process of solving integrals by hand and coding a custom Gibbs sampler for the models I specified. Today, I probably wouldn’t bother with the maths. Instead, I’d specify the model and let a probabilistic programming tool like pymc3 or Stan handle the rest. Bayesian inference is now a commodity that’s accessible to any hacker.\nA part of my PhD thesis that can probably be replaced by a probabilistic programming tool Towards the end of my PhD in 2012, I got into Kaggle competitions. Back then, it seemed like “real” data science consisted of building and tuning machine learning models – that’s what Kaggle was all about. While I’ve done quite well in those competitions, I’ve come to realise that the utility of fine-tuning machine learning algorithms is quite limited. In reality, problem definition and solution measurement are more challenging and important. Using machine learning in practice is typically an engineering problem: We can use an existing service or package, follow best practices, and have a great solution for most use cases. No research or custom data work is required beyond turning data into features, which is essentially a data engineering problem. In short, solid machine learning solutions are delivered by solid engineers who glue together solid commodity components. Quoting Google’s Rules of Machine Learning:\nTo make great products: do machine learning like the great engineer you are, not like the great machine learning expert you aren’t.\nMost of the problems you will face are, in fact, engineering problems. Even with all the resources of a great machine learning expert, most of the gains come from great features, not great machine learning algorithms. So, the basic approach is:\nMake sure your pipeline is solid end to end. Start with a reasonable objective. Add common-sense features in a simple way. Make sure that your pipeline stays solid. Many problems in data “science” are actually engineering problems – described best by the flow on the right (source) Some of my first jobs as a data scientist in industry involved building recommender systems. With recommender systems, much of the work is on the system around the recommendation algorithm. That is, building a recommender system was always mostly an engineering problem. However, these days we have services like AWS Personalize, which does most of the heavy lifting around recommendation. This makes the deployment of recommender systems a pure engineering problem. Like many other problems, recommender systems have been commodified.\nI have not done much with deep learning, but there the general trend is even more apparent: Useful innovations quickly turn into tools. Examples include library evolution from Theano to TensorFlow, and commodified prediction services from companies like Google, Amazon, and Microsoft. If you want to use a deep learning service in your application, you probably don’t need a data scientist or even a machine learning engineer. A solid software engineer who can pick the right tools should be enough.\nHow to remain relevant? So where does this leave us? It seems to be a more general phenomenon. Essentially every problem that requires specialised knowledge and is valuable ends up attracting repeatable solutions that obviate the need for deep thinking and manual work. These solutions are software commodities. Deploying them is a matter of writing some glue code and fitting them into the overall system – an engineering problem. Implementing data science components to compete with commodities may be interesting and fun, but it’s usually a waste of time when there’s a generic solution that is good enough.\nAs an individual data scientist, what can you do when your speciality becomes a software commodity? I see a few options:\nEmbrace the engineering angle. Become good (or better) at engineering solutions. Be pragmatic. Do what it takes to get the job done. This is probably easier for data scientists like me, who have an engineering background, than for more research/analysis-oriented data scientists. Such data scientists sometimes sneer at engineering work, claiming it’s “fake” data science. Fake or not, solid engineering tools can easily make stubborn data scientists obsolete. Keep building custom solutions even when viable commodities exist. While this may be more fun for the individual, I believe it isn’t a sustainable approach. The cost of building and maintaining custom solutions will typically be higher than the cost of commodity solutions. Insisting on custom solutions seems like a recipe for becoming irrelevant. Keep adapting and moving to non-commodity areas. Some things are easier to automate than others. For example, building a machine learning pipeline when the problem is well-defined is relatively easy, but deciding what features to create typically requires some domain expertise. In addition, new research keeps coming out in areas that are less hot than machine learning. One such area is causal inference, where there are still solutions that are yet to be commodified. Move to the cutting edge. If you want to research novel methods, a “standard” data scientist position may not be for you. Many industry positions are focused on applying proven solutions to a specific organisation. If that doesn’t sound like fun, you’re better off moving to academia or joining a commercial research group. Are there any other options I don’t see? Let me know in the comments!\n","wordCount":"1116","inLanguage":"en","image":"https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman.png","datePublished":"2020-01-11T09:22:35Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Software commodities are eating interesting data science work</h1><div class=post-meta><span title='2020-01-11 09:22:35 +0000 UTC'>January 11, 2020</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman_huf6fd963093a5068d761a419fcde11af6_17555_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman_huf6fd963093a5068d761a419fcde11af6_17555_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman_huf6fd963093a5068d761a419fcde11af6_17555_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman_huf6fd963093a5068d761a419fcde11af6_17555_1080x0_resize_box_3.png 1080w ,https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman_huf6fd963093a5068d761a419fcde11af6_17555_1500x0_resize_box_3.png 1500w ,https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman.png 1920w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman.png alt width=1920 height=714></figure><div class=post-content><blockquote><p>The passage of time makes wizards of us all. Today, any dullard can make bells ring across the ocean by tapping out phone numbers, cause inanimate toys to march by barking an order, or activate remote devices by touching a wireless screen. Thomas Edison couldn&rsquo;t have managed any of this at his peak—and shortly before his time, such powers would have been considered the unique realm of God.</p><footer><strong>Rob Reid</strong>
+<meta name=keywords content="business,career,data science,software engineering"><meta name=description content="Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Software commodities are eating interesting data science work"><meta property="og:description" content="Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?"><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/"><meta property="og:image" content="https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2020-01-11T09:22:35+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman.png"><meta name=twitter:title content="Software commodities are eating interesting data science work"><meta name=twitter:description content="Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?"><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Software commodities are eating interesting data science work","item":"https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Software commodities are eating interesting data science work","name":"Software commodities are eating interesting data science work","description":"Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?","keywords":["business","career","data science","software engineering"],"articleBody":" The passage of time makes wizards of us all. Today, any dullard can make bells ring across the ocean by tapping out phone numbers, cause inanimate toys to march by barking an order, or activate remote devices by touching a wireless screen. Thomas Edison couldn’t have managed any of this at his peak—and shortly before his time, such powers would have been considered the unique realm of God.\nRob Reid After On Being a data scientist can sometimes feel like a race against software innovations. Every interesting and useful problem is bound to become a software commodity. My story seems to reflect that: From my first steps in sentiment analysis and topic modelling, through building recommender systems while dabbling in Kaggle competitions and deep learning a few years ago, and to my present-day interest in causal inference. What can one do to remain relevant in such an environment? Read this post to find out.\nHighlights from my past When I started my PhD in 2009, the plan was to work on sentiment analysis of opinion polls. This got me into applied machine learning using Java and Weka, with which I made some modest contributions to the field. Today, researching sentiment analysis would feel somewhat pointless, given the plethora of sentiment analysis services. Sentiment analysis is a commodity – using it in practice is a software engineering problem.\nMoving forward in my PhD, I got into topic modelling. I learned about Bayesian statistics and conjugate priors. I went through the arduous process of solving integrals by hand and coding a custom Gibbs sampler for the models I specified. Today, I probably wouldn’t bother with the maths. Instead, I’d specify the model and let a probabilistic programming tool like pymc3 or Stan handle the rest. Bayesian inference is now a commodity that’s accessible to any hacker.\nA part of my PhD thesis that can probably be replaced by a probabilistic programming tool Towards the end of my PhD in 2012, I got into Kaggle competitions. Back then, it seemed like “real” data science consisted of building and tuning machine learning models – that’s what Kaggle was all about. While I’ve done quite well in those competitions, I’ve come to realise that the utility of fine-tuning machine learning algorithms is quite limited. In reality, problem definition and solution measurement are more challenging and important. Using machine learning in practice is typically an engineering problem: We can use an existing service or package, follow best practices, and have a great solution for most use cases. No research or custom data work is required beyond turning data into features, which is essentially a data engineering problem. In short, solid machine learning solutions are delivered by solid engineers who glue together solid commodity components. Quoting Google’s Rules of Machine Learning:\nTo make great products: do machine learning like the great engineer you are, not like the great machine learning expert you aren’t.\nMost of the problems you will face are, in fact, engineering problems. Even with all the resources of a great machine learning expert, most of the gains come from great features, not great machine learning algorithms. So, the basic approach is:\nMake sure your pipeline is solid end to end. Start with a reasonable objective. Add common-sense features in a simple way. Make sure that your pipeline stays solid. Many problems in data “science” are actually engineering problems – described best by the flow on the right (source) Some of my first jobs as a data scientist in industry involved building recommender systems. With recommender systems, much of the work is on the system around the recommendation algorithm. That is, building a recommender system was always mostly an engineering problem. However, these days we have services like AWS Personalize, which does most of the heavy lifting around recommendation. This makes the deployment of recommender systems a pure engineering problem. Like many other problems, recommender systems have been commodified.\nI have not done much with deep learning, but there the general trend is even more apparent: Useful innovations quickly turn into tools. Examples include library evolution from Theano to TensorFlow, and commodified prediction services from companies like Google, Amazon, and Microsoft. If you want to use a deep learning service in your application, you probably don’t need a data scientist or even a machine learning engineer. A solid software engineer who can pick the right tools should be enough.\nHow to remain relevant? So where does this leave us? It seems to be a more general phenomenon. Essentially every problem that requires specialised knowledge and is valuable ends up attracting repeatable solutions that obviate the need for deep thinking and manual work. These solutions are software commodities. Deploying them is a matter of writing some glue code and fitting them into the overall system – an engineering problem. Implementing data science components to compete with commodities may be interesting and fun, but it’s usually a waste of time when there’s a generic solution that is good enough.\nAs an individual data scientist, what can you do when your speciality becomes a software commodity? I see a few options:\nEmbrace the engineering angle. Become good (or better) at engineering solutions. Be pragmatic. Do what it takes to get the job done. This is probably easier for data scientists like me, who have an engineering background, than for more research/analysis-oriented data scientists. Such data scientists sometimes sneer at engineering work, claiming it’s “fake” data science. Fake or not, solid engineering tools can easily make stubborn data scientists obsolete. Keep building custom solutions even when viable commodities exist. While this may be more fun for the individual, I believe it isn’t a sustainable approach. The cost of building and maintaining custom solutions will typically be higher than the cost of commodity solutions. Insisting on custom solutions seems like a recipe for becoming irrelevant. Keep adapting and moving to non-commodity areas. Some things are easier to automate than others. For example, building a machine learning pipeline when the problem is well-defined is relatively easy, but deciding what features to create typically requires some domain expertise. In addition, new research keeps coming out in areas that are less hot than machine learning. One such area is causal inference, where there are still solutions that are yet to be commodified. Move to the cutting edge. If you want to research novel methods, a “standard” data scientist position may not be for you. Many industry positions are focused on applying proven solutions to a specific organisation. If that doesn’t sound like fun, you’re better off moving to academia or joining a commercial research group. Are there any other options I don’t see? Let me know in the comments!\n","wordCount":"1116","inLanguage":"en","image":"https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman.png","datePublished":"2020-01-11T09:22:35Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Software commodities are eating interesting data science work</h1><div class=post-meta><span title='2020-01-11 09:22:35 +0000 UTC'>January 11, 2020</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman_huf6fd963093a5068d761a419fcde11af6_17555_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman_huf6fd963093a5068d761a419fcde11af6_17555_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman_huf6fd963093a5068d761a419fcde11af6_17555_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman_huf6fd963093a5068d761a419fcde11af6_17555_1080x0_resize_box_3.png 1080w ,https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman_huf6fd963093a5068d761a419fcde11af6_17555_1500x0_resize_box_3.png 1500w ,https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman.png 1920w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/pacman.png alt width=1920 height=714></figure><div class=post-content><blockquote><p>The passage of time makes wizards of us all. Today, any dullard can make bells ring across the ocean by tapping out phone numbers, cause inanimate toys to march by barking an order, or activate remote devices by touching a wireless screen. Thomas Edison couldn&rsquo;t have managed any of this at his peak—and shortly before his time, such powers would have been considered the unique realm of God.</p><footer><strong>Rob Reid</strong>
 <cite><a href=https://after-on.com/after-on-novel title=https://after-on.com/after-on-novel target=_blank rel=noopener>After On</a></cite></footer></blockquote><p>Being a data scientist can sometimes feel like a race against software innovations. Every interesting and useful problem is bound to become a software commodity. My story seems to reflect that: From my first steps in sentiment analysis and topic modelling, through building recommender systems while dabbling in Kaggle competitions and deep learning a few years ago, and to <a href=https://yanirseroussi.com/causal-inference-resources/>my present-day interest in causal inference</a>. What can one do to remain relevant in such an environment? Read this post to find out.</p><h2 id=highlights-from-my-past>Highlights from my past<a hidden class=anchor aria-hidden=true href=#highlights-from-my-past>#</a></h2><p>When I started my PhD in 2009, <a href=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/>the plan was to work on sentiment analysis of opinion polls</a>. This got me into applied machine learning using Java and <a href=https://www.cs.waikato.ac.nz/ml/weka/ target=_blank rel=noopener>Weka</a>, with which I made some modest contributions to the field. Today, researching sentiment analysis would feel somewhat pointless, given the plethora of sentiment analysis services. Sentiment analysis is a commodity – using it in practice is a software engineering problem.</p><p>Moving forward in my PhD, I got into topic modelling. I learned about Bayesian statistics and conjugate priors. I went through the arduous process of solving integrals by hand and coding a custom Gibbs sampler for <a href=https://yanirseroussi.com/phd-work/>the models I specified</a>. Today, I probably wouldn&rsquo;t bother with the maths. Instead, I&rsquo;d specify the model and let a probabilistic programming tool like <a href=https://docs.pymc.io/ target=_blank rel=noopener>pymc3</a> or <a href=https://mc-stan.org/ target=_blank rel=noopener>Stan</a> handle the rest. Bayesian inference is now a commodity that&rsquo;s <a href=http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/ target=_blank rel=noopener>accessible to any hacker</a>.</p><figure><a href=thesis-maths.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/thesis-maths_hu0ea4991cbd3b0c7c32427194623a941b_124619_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/thesis-maths_hu0ea4991cbd3b0c7c32427194623a941b_124619_480x0_resize_box_3.png 480w,
diff --git a/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/index.html b/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/index.html
index 8a89268e9..81a7da45b 100644
--- a/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/index.html
+++ b/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Many is not enough: Counting simulations to bootstrap the right way | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="bootstrapping,confidence intervals,data science,statistics"><meta name=description content="Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Many is not enough: Counting simulations to bootstrap the right way"><meta property="og:description" content="Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/"><meta property="og:image" content="https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2020-08-24T01:35:17+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting.jpg"><meta name=twitter:title content="Many is not enough: Counting simulations to bootstrap the right way"><meta name=twitter:description content="Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Many is not enough: Counting simulations to bootstrap the right way","item":"https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Many is not enough: Counting simulations to bootstrap the right way","name":"Many is not enough: Counting simulations to bootstrap the right way","description":"Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals.","keywords":["bootstrapping","confidence intervals","data science","statistics"],"articleBody":"Previously, I encouraged readers to test different approaches to bootstrapped confidence interval (CI) estimation. Such testing can done by relying on the definition of CIs: Given an infinite number of independent samples from the same population, we expect a ci_level CI to contain the population parameter in exactly ci_level percent of the samples. Therefore, we run “many” simulations (num_simulations), where each simulation generates a random sample from the same population and runs the CI algorithm on the sample. We then look at the observed CI level (i.e., the percentage of CIs that contain the true population parameter), and say that the CI algorithm works as expected if the observed CI level is “not too far” from the requested ci_level.\nKeen observers may notice that the language I used to describe the process isn’t accurate enough. How many is “many” simulations? How far is “not too far”?\nI made a mistake by not asking and answering these questions before. I decided that num_simulations=1,000 is a reasonable number of simulations, and didn’t consider how this affects the observed CI level. The decision to use num_simulations=1,000 was informed by practical concerns (i.e., wanting the simulations to finish within a reasonable timeframe), while ranges for the observed CI level were determined empirically – by observing the results of the simulations rather than by considering the properties of the problem.\nThe idea of using simulations to test bootstrapped CIs came from Tim Hesterberg’s What Teachers Should Know about the Bootstrap. The experiments presented in that paper used num_simulations=10,000, but it wasn’t made clear why this number was chosen. This may have been due to space limitations or because this point is obvious to experienced statisticians. Embarrassingly, my approach of using fewer simulations without considering how they affect the observed CIs can be seen as a form of Belief in The Law of Small Numbers.\nFortunately, it’s not hard to move away from belief in the law of small numbers in this case: We can see a set of simulations as sampling from Binomial(n=num_simulations, p=ci_level), where the number of “successes” is the number of simulations where the true population parameter falls in the CI returned by the CI algorithm. We can define our desired level of confidence in the simulation results as the simulation confidence, and use the simulation confidence interval of the binomial distribution to decide on a likely range for the observed CI level.\nTo make this more concrete, here’s a Python function that gives the observed CI level bounds for different values of num_simulations, given the ci_level and simulation confidence. The output from running this function with the default arguments is plotted below.\nimport numpy as np import pandas as pd import scipy.stats def get_observed_ci_bounds( all_num_simulations=(10, 100, 500, 1000, 2000, 5000, 10000), ci_level=0.95, simulation_confidence=0.99 ): return pd.DataFrame( index=pd.Series(all_num_simulations, name='num_simulations'), data=[ np.array( scipy.stats.binom.interval(simulation_confidence, n=num_simulations, p=ci_level) ) / num_simulations for num_simulations in all_num_simulations ], columns=['low', 'high'] ) * 100 \u003e\u003e\u003e print(get_observed_ci_bounds()) num_simulations low high 10 70.00 100.00 100 89.00 100.00 500 92.40 97.40 1000 93.10 96.70 2000 93.70 96.20 5000 94.18 95.78 10000 94.43 95.55 Therefore, when setting num_simulations to 1,000 (as I did in the experiments I presented previously), we can be 99% confident that the observed CI level of a perfect CI algorithm would be between 93.1% and 96.7% when asked to generate 95% CIs. As shown by the following figure, this doesn’t materially change my previous conclusions: On the dataset from those experiments, the Studentized algorithm delivers satisfactory results, while the Percentile and BCa algorithms are quite far from perfection. And of course, we can now quantify their distance from perfection – the CIs they yield in the best case would be acceptable if we wanted 90% CIs, where we expect the observed CI to be in the 87.5% to 92.4% range (obtained by running the function above with ci_level=0.9). As there are better alternatives, I believe that this is a good enough reason to avoid using the Percentile and BCa algorithms.\nNotes: See this notebook for code – use the same environment as the original notebook. The cover photo is by Dima D from Pexels.\n","wordCount":"684","inLanguage":"en","image":"https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting.jpg","datePublished":"2020-08-24T01:35:17Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Many is not enough: Counting simulations to bootstrap the right way</h1><div class=post-meta><span title='2020-08-24 01:35:17 +0000 UTC'>August 24, 2020</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting_hu3d03a01dcc18bc5be0e67db3d8d209a6_1687432_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting_hu3d03a01dcc18bc5be0e67db3d8d209a6_1687432_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting_hu3d03a01dcc18bc5be0e67db3d8d209a6_1687432_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting_hu3d03a01dcc18bc5be0e67db3d8d209a6_1687432_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting_hu3d03a01dcc18bc5be0e67db3d8d209a6_1687432_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting.jpg 4896w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting.jpg alt width=4896 height=3264></figure><div class=post-content><p><a href=https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/>Previously, I encouraged readers to test different approaches to bootstrapped confidence interval (CI) estimation</a>. Such testing can done by relying on <a href=https://en.wikipedia.org/wiki/Confidence_interval target=_blank rel=noopener>the definition of CIs</a>: Given an infinite number of independent samples from the same population, we expect a <code>ci_level</code> CI to contain the population parameter in exactly <code>ci_level</code> percent of the samples. Therefore, we run &ldquo;many&rdquo; simulations (<code>num_simulations</code>), where each simulation generates a random sample from the same population and runs the CI algorithm on the sample. We then look at the <em>observed</em> CI level (i.e., the percentage of CIs that contain the true population parameter), and say that the CI algorithm works as expected if the observed CI level is &ldquo;not too far&rdquo; from the requested <code>ci_level</code>.</p><p><strong>Keen observers may notice that the language I used to describe the process isn&rsquo;t accurate enough. How many is &ldquo;many&rdquo; simulations? How far is &ldquo;not too far&rdquo;?</strong></p><p>I made a mistake by not asking and answering these questions before. I decided that <code>num_simulations</code>=1,000 is a reasonable number of simulations, and didn&rsquo;t consider how this affects the observed CI level. The decision to use <code>num_simulations</code>=1,000 was informed by practical concerns (i.e., wanting the simulations to finish within a reasonable timeframe), while ranges for the observed CI level were determined empirically – by observing the results of the simulations rather than by considering the properties of the problem.</p><p>The idea of using simulations to test bootstrapped CIs came from Tim Hesterberg&rsquo;s <a href=https://arxiv.org/abs/1411.5279 target=_blank rel=noopener>What Teachers Should Know about the Bootstrap</a>. The experiments presented in that paper used <code>num_simulations</code>=10,000, but it wasn&rsquo;t made clear why this number was chosen. This may have been due to space limitations or because this point is obvious to experienced statisticians. Embarrassingly, my approach of using fewer simulations without considering how they affect the observed CIs can be seen as a form of <a href=http://stats.org.uk/statistical-inference/TverskyKahneman1971.pdf target=_blank rel=noopener>Belief in The Law of Small Numbers</a>.</p><p>Fortunately, it&rsquo;s not hard to move away from belief in the law of small numbers in this case: We can see a set of simulations as sampling from <code>Binomial(n=num_simulations, p=ci_level)</code>, where the number of &ldquo;successes&rdquo; is the number of simulations where the true population parameter falls in the CI returned by the CI algorithm. We can define our desired level of confidence in the simulation results as the <em>simulation confidence</em>, and use the simulation confidence interval of the binomial distribution to decide on a likely range for the observed CI level.</p><p>To make this more concrete, here&rsquo;s a Python function that gives the observed CI level bounds for different values of <code>num_simulations</code>, given the <code>ci_level</code> and simulation confidence. The output from running this function with the default arguments is plotted below.</p><div class=highlight><pre tabindex=0 style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-python data-lang=python><span style=display:flex><span><span style=color:#f92672>import</span> numpy <span style=color:#66d9ef>as</span> np
+<meta name=keywords content="bootstrapping,confidence intervals,data science,statistics"><meta name=description content="Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Many is not enough: Counting simulations to bootstrap the right way"><meta property="og:description" content="Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/"><meta property="og:image" content="https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2020-08-24T01:35:17+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting.jpg"><meta name=twitter:title content="Many is not enough: Counting simulations to bootstrap the right way"><meta name=twitter:description content="Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Many is not enough: Counting simulations to bootstrap the right way","item":"https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Many is not enough: Counting simulations to bootstrap the right way","name":"Many is not enough: Counting simulations to bootstrap the right way","description":"Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals.","keywords":["bootstrapping","confidence intervals","data science","statistics"],"articleBody":"Previously, I encouraged readers to test different approaches to bootstrapped confidence interval (CI) estimation. Such testing can done by relying on the definition of CIs: Given an infinite number of independent samples from the same population, we expect a ci_level CI to contain the population parameter in exactly ci_level percent of the samples. Therefore, we run “many” simulations (num_simulations), where each simulation generates a random sample from the same population and runs the CI algorithm on the sample. We then look at the observed CI level (i.e., the percentage of CIs that contain the true population parameter), and say that the CI algorithm works as expected if the observed CI level is “not too far” from the requested ci_level.\nKeen observers may notice that the language I used to describe the process isn’t accurate enough. How many is “many” simulations? How far is “not too far”?\nI made a mistake by not asking and answering these questions before. I decided that num_simulations=1,000 is a reasonable number of simulations, and didn’t consider how this affects the observed CI level. The decision to use num_simulations=1,000 was informed by practical concerns (i.e., wanting the simulations to finish within a reasonable timeframe), while ranges for the observed CI level were determined empirically – by observing the results of the simulations rather than by considering the properties of the problem.\nThe idea of using simulations to test bootstrapped CIs came from Tim Hesterberg’s What Teachers Should Know about the Bootstrap. The experiments presented in that paper used num_simulations=10,000, but it wasn’t made clear why this number was chosen. This may have been due to space limitations or because this point is obvious to experienced statisticians. Embarrassingly, my approach of using fewer simulations without considering how they affect the observed CIs can be seen as a form of Belief in The Law of Small Numbers.\nFortunately, it’s not hard to move away from belief in the law of small numbers in this case: We can see a set of simulations as sampling from Binomial(n=num_simulations, p=ci_level), where the number of “successes” is the number of simulations where the true population parameter falls in the CI returned by the CI algorithm. We can define our desired level of confidence in the simulation results as the simulation confidence, and use the simulation confidence interval of the binomial distribution to decide on a likely range for the observed CI level.\nTo make this more concrete, here’s a Python function that gives the observed CI level bounds for different values of num_simulations, given the ci_level and simulation confidence. The output from running this function with the default arguments is plotted below.\nimport numpy as np import pandas as pd import scipy.stats def get_observed_ci_bounds( all_num_simulations=(10, 100, 500, 1000, 2000, 5000, 10000), ci_level=0.95, simulation_confidence=0.99 ): return pd.DataFrame( index=pd.Series(all_num_simulations, name='num_simulations'), data=[ np.array( scipy.stats.binom.interval(simulation_confidence, n=num_simulations, p=ci_level) ) / num_simulations for num_simulations in all_num_simulations ], columns=['low', 'high'] ) * 100 \u003e\u003e\u003e print(get_observed_ci_bounds()) num_simulations low high 10 70.00 100.00 100 89.00 100.00 500 92.40 97.40 1000 93.10 96.70 2000 93.70 96.20 5000 94.18 95.78 10000 94.43 95.55 Therefore, when setting num_simulations to 1,000 (as I did in the experiments I presented previously), we can be 99% confident that the observed CI level of a perfect CI algorithm would be between 93.1% and 96.7% when asked to generate 95% CIs. As shown by the following figure, this doesn’t materially change my previous conclusions: On the dataset from those experiments, the Studentized algorithm delivers satisfactory results, while the Percentile and BCa algorithms are quite far from perfection. And of course, we can now quantify their distance from perfection – the CIs they yield in the best case would be acceptable if we wanted 90% CIs, where we expect the observed CI to be in the 87.5% to 92.4% range (obtained by running the function above with ci_level=0.9). As there are better alternatives, I believe that this is a good enough reason to avoid using the Percentile and BCa algorithms.\nNotes: See this notebook for code – use the same environment as the original notebook. The cover photo is by Dima D from Pexels.\n","wordCount":"684","inLanguage":"en","image":"https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting.jpg","datePublished":"2020-08-24T01:35:17Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Many is not enough: Counting simulations to bootstrap the right way</h1><div class=post-meta><span title='2020-08-24 01:35:17 +0000 UTC'>August 24, 2020</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting_hu3d03a01dcc18bc5be0e67db3d8d209a6_1687432_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting_hu3d03a01dcc18bc5be0e67db3d8d209a6_1687432_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting_hu3d03a01dcc18bc5be0e67db3d8d209a6_1687432_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting_hu3d03a01dcc18bc5be0e67db3d8d209a6_1687432_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting_hu3d03a01dcc18bc5be0e67db3d8d209a6_1687432_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting.jpg 4896w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/santa-counting.jpg alt width=4896 height=3264></figure><div class=post-content><p><a href=https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/>Previously, I encouraged readers to test different approaches to bootstrapped confidence interval (CI) estimation</a>. Such testing can done by relying on <a href=https://en.wikipedia.org/wiki/Confidence_interval target=_blank rel=noopener>the definition of CIs</a>: Given an infinite number of independent samples from the same population, we expect a <code>ci_level</code> CI to contain the population parameter in exactly <code>ci_level</code> percent of the samples. Therefore, we run &ldquo;many&rdquo; simulations (<code>num_simulations</code>), where each simulation generates a random sample from the same population and runs the CI algorithm on the sample. We then look at the <em>observed</em> CI level (i.e., the percentage of CIs that contain the true population parameter), and say that the CI algorithm works as expected if the observed CI level is &ldquo;not too far&rdquo; from the requested <code>ci_level</code>.</p><p><strong>Keen observers may notice that the language I used to describe the process isn&rsquo;t accurate enough. How many is &ldquo;many&rdquo; simulations? How far is &ldquo;not too far&rdquo;?</strong></p><p>I made a mistake by not asking and answering these questions before. I decided that <code>num_simulations</code>=1,000 is a reasonable number of simulations, and didn&rsquo;t consider how this affects the observed CI level. The decision to use <code>num_simulations</code>=1,000 was informed by practical concerns (i.e., wanting the simulations to finish within a reasonable timeframe), while ranges for the observed CI level were determined empirically – by observing the results of the simulations rather than by considering the properties of the problem.</p><p>The idea of using simulations to test bootstrapped CIs came from Tim Hesterberg&rsquo;s <a href=https://arxiv.org/abs/1411.5279 target=_blank rel=noopener>What Teachers Should Know about the Bootstrap</a>. The experiments presented in that paper used <code>num_simulations</code>=10,000, but it wasn&rsquo;t made clear why this number was chosen. This may have been due to space limitations or because this point is obvious to experienced statisticians. Embarrassingly, my approach of using fewer simulations without considering how they affect the observed CIs can be seen as a form of <a href=http://stats.org.uk/statistical-inference/TverskyKahneman1971.pdf target=_blank rel=noopener>Belief in The Law of Small Numbers</a>.</p><p>Fortunately, it&rsquo;s not hard to move away from belief in the law of small numbers in this case: We can see a set of simulations as sampling from <code>Binomial(n=num_simulations, p=ci_level)</code>, where the number of &ldquo;successes&rdquo; is the number of simulations where the true population parameter falls in the CI returned by the CI algorithm. We can define our desired level of confidence in the simulation results as the <em>simulation confidence</em>, and use the simulation confidence interval of the binomial distribution to decide on a likely range for the observed CI level.</p><p>To make this more concrete, here&rsquo;s a Python function that gives the observed CI level bounds for different values of <code>num_simulations</code>, given the <code>ci_level</code> and simulation confidence. The output from running this function with the default arguments is plotted below.</p><div class=highlight><pre tabindex=0 style=color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4><code class=language-python data-lang=python><span style=display:flex><span><span style=color:#f92672>import</span> numpy <span style=color:#66d9ef>as</span> np
 </span></span><span style=display:flex><span><span style=color:#f92672>import</span> pandas <span style=color:#66d9ef>as</span> pd
 </span></span><span style=display:flex><span><span style=color:#f92672>import</span> scipy.stats
 </span></span><span style=display:flex><span> 
diff --git a/2021/04/05/some-highlights-from-2020/index.html b/2021/04/05/some-highlights-from-2020/index.html
index 0b2cf2662..601c20e4d 100644
--- a/2021/04/05/some-highlights-from-2020/index.html
+++ b/2021/04/05/some-highlights-from-2020/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Some highlights from 2020 | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="career,causal inference,environment,Reef Life Survey,remote work,split testing,sustainability"><meta name=description content="Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic&rsquo;s Experimentation Platform."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Some highlights from 2020"><meta property="og:description" content="Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic&rsquo;s Experimentation Platform."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/"><meta property="og:image" content="https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2021-04-05T06:41:48+00:00"><meta property="article:modified_time" content="2024-02-21T11:52:55+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island.jpg"><meta name=twitter:title content="Some highlights from 2020"><meta name=twitter:description content="Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic&rsquo;s Experimentation Platform."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Some highlights from 2020","item":"https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Some highlights from 2020","name":"Some highlights from 2020","description":"Sharing remote teamwork insights, my climate \u0026amp; sustainability activism, Reef Life Survey publications, and progress on Automattic\u0026rsquo;s Experimentation Platform.","keywords":["career","causal inference","environment","Reef Life Survey","remote work","split testing","sustainability"],"articleBody":"My track record of posting here has been pretty poor in 2020, partly because of a bunch of content I’ve contributed elsewhere. In general, my guiding principle for posting is to only add stuff I’d want to read or cite, e.g., because I haven’t seen it discussed elsewhere. Well, no one has compiled a meta-post of my public work from 2020 (that I know of), so it’s finally time to publish it myself.\nRemote work. I’ve been working remotely with Automattic since 2017, so I was pretty covid-ready as far as work was concerned. The main thing that’s changed for me is being unable to meet my colleagues in person. Looking back at the interview I did with BuiltIn from March 2020, it’s somewhat amusing that I was hopeful that we’d get to travel in May 2020, as business trips are still on hold a year later. Outside Automattic, it was interesting to see how quickly remote work has become commonplace, to the point where my curated list of established remote companies now seems irrelevant. Also, my June webinar with Felipe Flores on running remote teams is probably dated now that many more people have hands-on experience with remote work. The world has adapted quickly, though it seems like Automattic’s globally-distributed model is still quite unusual. Instead, many companies have switched to a locally-remote model, hiring remotely within the same country or timezone region. Considering the coordination costs of globally-distributed teams and the impact of frequent long-haul flights on employee wellbeing and on our environment, it may turn out that the locally-remote model is more sustainable in the long term. Only time will tell.\nSustainability. The Australian bushfires of 2019-20 provided me with extra motivation to help nudge Automattic to do more in the fight against climate change. The initial covid-19 lockdown provided me with extra free time to make the measurement and offsetting of Automattic’s emissions from data centre power use happen. I summarised this work in a post on the company’s blog, and discussed it in an interview with PublishPress. If there’s one key reason why I haven’t posted more here, it’s that the sustainability work always seems more worthwhile. I hope to continue working in the area in 2021, so the frequency of posts here is likely to remain about the same.\nWhile data from RLS dives helps global conservation efforts, diving also reminds me that there’s still so much left to save and conserve Reef Life Survey (RLS). Another distributed organisation that I’m involved with, and a worthwhile cause, is the RLS foundation. I previously posted about my experiences with RLS offline data collection and visualisation of the collected data, and have since helped with quite a few RLS surveys. Despite lockdowns and border closures, 2020 was no exception: I participated in the Lord Howe biennial surveys in February (just before the initial lockdown), and was fortunate to join a survey trip from Airlie Beach to Thursday Island in October (long after lockdown lifted in the lucky state of Queensland). I also joined the 38(!) author list of Establishing the ecological basis for conservation of shallow marine life using Reef Life Survey – a Biological Conservation journal paper covering RLS’s history, methodology, outcomes, and more. Finally, I was surprised and honoured to receive the Scoresby Shepherd Award for doing the most RLS surveys in the 2019-20 financial year. It was clearly a bit of a slow year due to the pandemic, but it’s always nice to get recognised. Overall, 2020 was definitely a good year for my participation in RLS and I’m planning on contributing more in 2021, especially with help around organising and conducting surveys in Southeast Queensland.\nTechnical work. My main “day job” focus in 2020 was on being the tech lead for Automattic’s new experimentation platform (ExPlat). This aligns well with my long-standing interest in causal inference. Among other things, it gave me an opportunity to apply my favourite approach to Bayesian A/B testing in the wild, and get excited about other interesting causal inference work we have in the pipeline. Now that ExPlat’s foundation is mostly in place, we are planning on sharing much of our work on data.blog. My colleague Aaron just published the first post in the series, and my post on ExPlat’s architecture will be next. Subscribe to data.blog to get updates!\n","wordCount":"723","inLanguage":"en","image":"https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island.jpg","datePublished":"2021-04-05T06:41:48Z","dateModified":"2024-02-21T11:52:55+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Some highlights from 2020</h1><div class=post-meta><span title='2021-04-05 06:41:48 +0000 UTC'>April 5, 2021</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island_hu6d31f51738dbcad4c67a5473d8db48f6_2186164_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island_hu6d31f51738dbcad4c67a5473d8db48f6_2186164_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island_hu6d31f51738dbcad4c67a5473d8db48f6_2186164_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island_hu6d31f51738dbcad4c67a5473d8db48f6_2186164_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island_hu6d31f51738dbcad4c67a5473d8db48f6_2186164_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island.jpg 3638w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island.jpg alt width=3638 height=1032></figure><div class=post-content><p>My track record of posting here has been pretty poor in 2020, partly because of a bunch of content I&rsquo;ve contributed elsewhere. In general, my guiding principle for posting is to only add stuff I&rsquo;d want to read or cite, e.g., because I haven&rsquo;t seen it discussed elsewhere. Well, no one has compiled a meta-post of my public work from 2020 (that I know of), so it&rsquo;s finally time to publish it myself.</p><p><strong>Remote work.</strong> I&rsquo;ve been working remotely with <a href=https://automattic.com/ target=_blank rel=noopener>Automattic</a> since 2017, so I was pretty covid-ready as far as work was concerned. The main thing that&rsquo;s changed for me is being unable to meet my colleagues in person. Looking back at <a href=https://builtin.com/remote-work/remote-data-teams target=_blank rel=noopener>the interview I did with BuiltIn from March 2020</a>, it&rsquo;s somewhat amusing that I was hopeful that we&rsquo;d get to travel in May 2020, as business trips are still on hold a year later. Outside Automattic, it was interesting to see how quickly remote work has become commonplace, to the point where <a href=https://github.com/yanirs/established-remote/ target=_blank rel=noopener>my curated list of established remote companies</a> now seems irrelevant. Also, <a href="https://www.youtube.com/watch?v=79LfP8Kqgvw" target=_blank rel=noopener>my June webinar with Felipe Flores on running remote teams</a> is probably dated now that many more people have hands-on experience with remote work. The world has adapted quickly, though it seems like Automattic&rsquo;s globally-distributed model is still quite unusual. Instead, many companies have switched to a <em>locally-remote</em> model, hiring remotely within the same country or timezone region. Considering the coordination costs of globally-distributed teams and the impact of frequent long-haul flights on employee wellbeing and on our environment, it may turn out that the locally-remote model is more sustainable in the long term. Only time will tell.</p><p><strong>Sustainability.</strong> The Australian bushfires of 2019-20 provided me with extra motivation to help nudge Automattic to do more in the fight against climate change. The initial covid-19 lockdown provided me with extra free time to make the measurement and offsetting of Automattic&rsquo;s emissions from data centre power use happen. I summarised this work in <a href=https://wordpress.com/blog/2020/09/21/toward-zero-reducing-and-offsetting-our-data-center-power-emissions/ target=_blank rel=noopener>a post on the company&rsquo;s blog</a>, and discussed it in <a href="https://www.youtube.com/watch?v=tMFr_agPLJY" target=_blank rel=noopener>an interview with PublishPress</a>. If there&rsquo;s one key reason why I haven&rsquo;t posted more here, it&rsquo;s that the sustainability work always seems more worthwhile. I hope to continue working in the area in 2021, so the frequency of posts here is likely to remain about the same.</p><figure><a href=bougainville-reef-wall-dive.jpg target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="career,causal inference,environment,Reef Life Survey,remote work,split testing,sustainability"><meta name=description content="Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic&rsquo;s Experimentation Platform."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Some highlights from 2020"><meta property="og:description" content="Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic&rsquo;s Experimentation Platform."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/"><meta property="og:image" content="https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2021-04-05T06:41:48+00:00"><meta property="article:modified_time" content="2024-02-21T11:52:55+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island.jpg"><meta name=twitter:title content="Some highlights from 2020"><meta name=twitter:description content="Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic&rsquo;s Experimentation Platform."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Some highlights from 2020","item":"https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Some highlights from 2020","name":"Some highlights from 2020","description":"Sharing remote teamwork insights, my climate \u0026amp; sustainability activism, Reef Life Survey publications, and progress on Automattic\u0026rsquo;s Experimentation Platform.","keywords":["career","causal inference","environment","Reef Life Survey","remote work","split testing","sustainability"],"articleBody":"My track record of posting here has been pretty poor in 2020, partly because of a bunch of content I’ve contributed elsewhere. In general, my guiding principle for posting is to only add stuff I’d want to read or cite, e.g., because I haven’t seen it discussed elsewhere. Well, no one has compiled a meta-post of my public work from 2020 (that I know of), so it’s finally time to publish it myself.\nRemote work. I’ve been working remotely with Automattic since 2017, so I was pretty covid-ready as far as work was concerned. The main thing that’s changed for me is being unable to meet my colleagues in person. Looking back at the interview I did with BuiltIn from March 2020, it’s somewhat amusing that I was hopeful that we’d get to travel in May 2020, as business trips are still on hold a year later. Outside Automattic, it was interesting to see how quickly remote work has become commonplace, to the point where my curated list of established remote companies now seems irrelevant. Also, my June webinar with Felipe Flores on running remote teams is probably dated now that many more people have hands-on experience with remote work. The world has adapted quickly, though it seems like Automattic’s globally-distributed model is still quite unusual. Instead, many companies have switched to a locally-remote model, hiring remotely within the same country or timezone region. Considering the coordination costs of globally-distributed teams and the impact of frequent long-haul flights on employee wellbeing and on our environment, it may turn out that the locally-remote model is more sustainable in the long term. Only time will tell.\nSustainability. The Australian bushfires of 2019-20 provided me with extra motivation to help nudge Automattic to do more in the fight against climate change. The initial covid-19 lockdown provided me with extra free time to make the measurement and offsetting of Automattic’s emissions from data centre power use happen. I summarised this work in a post on the company’s blog, and discussed it in an interview with PublishPress. If there’s one key reason why I haven’t posted more here, it’s that the sustainability work always seems more worthwhile. I hope to continue working in the area in 2021, so the frequency of posts here is likely to remain about the same.\nWhile data from RLS dives helps global conservation efforts, diving also reminds me that there’s still so much left to save and conserve Reef Life Survey (RLS). Another distributed organisation that I’m involved with, and a worthwhile cause, is the RLS foundation. I previously posted about my experiences with RLS offline data collection and visualisation of the collected data, and have since helped with quite a few RLS surveys. Despite lockdowns and border closures, 2020 was no exception: I participated in the Lord Howe biennial surveys in February (just before the initial lockdown), and was fortunate to join a survey trip from Airlie Beach to Thursday Island in October (long after lockdown lifted in the lucky state of Queensland). I also joined the 38(!) author list of Establishing the ecological basis for conservation of shallow marine life using Reef Life Survey – a Biological Conservation journal paper covering RLS’s history, methodology, outcomes, and more. Finally, I was surprised and honoured to receive the Scoresby Shepherd Award for doing the most RLS surveys in the 2019-20 financial year. It was clearly a bit of a slow year due to the pandemic, but it’s always nice to get recognised. Overall, 2020 was definitely a good year for my participation in RLS and I’m planning on contributing more in 2021, especially with help around organising and conducting surveys in Southeast Queensland.\nTechnical work. My main “day job” focus in 2020 was on being the tech lead for Automattic’s new experimentation platform (ExPlat). This aligns well with my long-standing interest in causal inference. Among other things, it gave me an opportunity to apply my favourite approach to Bayesian A/B testing in the wild, and get excited about other interesting causal inference work we have in the pipeline. Now that ExPlat’s foundation is mostly in place, we are planning on sharing much of our work on data.blog. My colleague Aaron just published the first post in the series, and my post on ExPlat’s architecture will be next. Subscribe to data.blog to get updates!\n","wordCount":"723","inLanguage":"en","image":"https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island.jpg","datePublished":"2021-04-05T06:41:48Z","dateModified":"2024-02-21T11:52:55+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Some highlights from 2020</h1><div class=post-meta><span title='2021-04-05 06:41:48 +0000 UTC'>April 5, 2021</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island_hu6d31f51738dbcad4c67a5473d8db48f6_2186164_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island_hu6d31f51738dbcad4c67a5473d8db48f6_2186164_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island_hu6d31f51738dbcad4c67a5473d8db48f6_2186164_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island_hu6d31f51738dbcad4c67a5473d8db48f6_2186164_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island_hu6d31f51738dbcad4c67a5473d8db48f6_2186164_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island.jpg 3638w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/lord-howe-island.jpg alt width=3638 height=1032></figure><div class=post-content><p>My track record of posting here has been pretty poor in 2020, partly because of a bunch of content I&rsquo;ve contributed elsewhere. In general, my guiding principle for posting is to only add stuff I&rsquo;d want to read or cite, e.g., because I haven&rsquo;t seen it discussed elsewhere. Well, no one has compiled a meta-post of my public work from 2020 (that I know of), so it&rsquo;s finally time to publish it myself.</p><p><strong>Remote work.</strong> I&rsquo;ve been working remotely with <a href=https://automattic.com/ target=_blank rel=noopener>Automattic</a> since 2017, so I was pretty covid-ready as far as work was concerned. The main thing that&rsquo;s changed for me is being unable to meet my colleagues in person. Looking back at <a href=https://builtin.com/remote-work/remote-data-teams target=_blank rel=noopener>the interview I did with BuiltIn from March 2020</a>, it&rsquo;s somewhat amusing that I was hopeful that we&rsquo;d get to travel in May 2020, as business trips are still on hold a year later. Outside Automattic, it was interesting to see how quickly remote work has become commonplace, to the point where <a href=https://github.com/yanirs/established-remote/ target=_blank rel=noopener>my curated list of established remote companies</a> now seems irrelevant. Also, <a href="https://www.youtube.com/watch?v=79LfP8Kqgvw" target=_blank rel=noopener>my June webinar with Felipe Flores on running remote teams</a> is probably dated now that many more people have hands-on experience with remote work. The world has adapted quickly, though it seems like Automattic&rsquo;s globally-distributed model is still quite unusual. Instead, many companies have switched to a <em>locally-remote</em> model, hiring remotely within the same country or timezone region. Considering the coordination costs of globally-distributed teams and the impact of frequent long-haul flights on employee wellbeing and on our environment, it may turn out that the locally-remote model is more sustainable in the long term. Only time will tell.</p><p><strong>Sustainability.</strong> The Australian bushfires of 2019-20 provided me with extra motivation to help nudge Automattic to do more in the fight against climate change. The initial covid-19 lockdown provided me with extra free time to make the measurement and offsetting of Automattic&rsquo;s emissions from data centre power use happen. I summarised this work in <a href=https://wordpress.com/blog/2020/09/21/toward-zero-reducing-and-offsetting-our-data-center-power-emissions/ target=_blank rel=noopener>a post on the company&rsquo;s blog</a>, and discussed it in <a href="https://www.youtube.com/watch?v=tMFr_agPLJY" target=_blank rel=noopener>an interview with PublishPress</a>. If there&rsquo;s one key reason why I haven&rsquo;t posted more here, it&rsquo;s that the sustainability work always seems more worthwhile. I hope to continue working in the area in 2021, so the frequency of posts here is likely to remain about the same.</p><figure><a href=bougainville-reef-wall-dive.jpg target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/bougainville-reef-wall-dive_hufa91eac262d7ccfc888de175482140e1_5626271_360x0_resize_q75_box.jpg 360w,
 https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/bougainville-reef-wall-dive_hufa91eac262d7ccfc888de175482140e1_5626271_480x0_resize_q75_box.jpg 480w,
 https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/bougainville-reef-wall-dive_hufa91eac262d7ccfc888de175482140e1_5626271_720x0_resize_q75_box.jpg 720w,
diff --git a/2021/10/07/my-work-with-automattic/index.html b/2021/10/07/my-work-with-automattic/index.html
index 93c5760ac..513754223 100644
--- a/2021/10/07/my-work-with-automattic/index.html
+++ b/2021/10/07/my-work-with-automattic/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>My work with Automattic | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="Automattic,career,causal inference,data science,environment,machine learning,marketing,remote work,software engineering"><meta name=description content="Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I&rsquo;ve done with the company."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="My work with Automattic"><meta property="og:description" content="Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I&rsquo;ve done with the company."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2021/10/07/my-work-with-automattic/"><meta property="og:image" content="https://yanirseroussi.com/2021/10/07/my-work-with-automattic/bing-yanir-seroussi-automattic-work.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2021-10-07T00:00:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2021/10/07/my-work-with-automattic/bing-yanir-seroussi-automattic-work.webp"><meta name=twitter:title content="My work with Automattic"><meta name=twitter:description content="Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I&rsquo;ve done with the company."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"My work with Automattic","item":"https://yanirseroussi.com/2021/10/07/my-work-with-automattic/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"My work with Automattic","name":"My work with Automattic","description":"Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I\u0026rsquo;ve done with the company.","keywords":["Automattic","career","causal inference","data science","environment","machine learning","marketing","remote work","software engineering"],"articleBody":"Automattic is the company behind WordPress.com, Tumblr, Jetpack, WooCommerce, and several other products. I worked with Automattic as a Type B Data Scientist (i.e., I mostly built and deployed code to production) from May 2017 to October 2021. This post is back-dated to my last day with the company to make it fit nicely into my post timeline, but I’m actually writing this in July 2023. The magic of time travel! 🪄\nA nice perk of working with Automattic was getting to write about my work on company blogs. When my website was on WordPress.com, I used the reblogging feature to share those posts here, but they never looked great. One of the first projects I completed after leaving Automattic was migrating my site from WordPress.com to Hugo, which made the reblog posts look even worse. Now all those reblogs redirect here, thanks to Hugo’s aliases feature.\nAnyway, here are some highlights from my Automattic work along with links to the relevant posts:\nLeading the build of a unified experimentation platform and spreading causal inference best practices throughout the organisation: ExPlat: Automattic’s Experimentation Platform (by Aaron Yan – Aaron was the team lead, and I was the tech lead for the project) Architecting ExPlat: Automattic’s New Experimentation Platform (by me) ExPlat’s Development Principles and Practices (by me) Co-developing pipe, a bespoke machine learning pipeline that was mostly used for marketing tasks when I was around (and is apparently still going strong in 2023 and beyond): Introducing pipe, The Automattic Machine Learning Pipeline (by Demet Dagdelen – pipe started as a two-person project that we worked on together) How to Increase Retention and Revenue in 1,000 Nontrivial Steps (by me) Building Thousands of Reproducible ML Models with pipe, the Automattic Machine Learning Pipeline (by Demet Dagdelen) Using ML for Campaign Optimization: Our Journey to Marketing Science at Automattic (by Demet Dagdelen) End-to-end implementation of automated customer chat tagging. My colleague Charles Earl published a post on the initial steps of the project around the time I joined the company. I helped get it to production shortly after I joined in 2017, once I was done with my first project that included improved measurement and presentation of key engagement metrics. In other words, I spent my first few months as an analytics engineer, then a few months as a machine learning engineer (classifications that were new or nonexistent back then). Encouraging the adoption of engineering best practices in data science projects. Hosting Cameron Davidson-Pilon for a chat and running internal book clubs and learning groups. Starting and co-leading an employee resource group to promote sustainability at Automattic, which resulted in carbon offsetting based on my research. On this website, you can also read about how I ended up joining Automattic and on some of the reasons behind my decision to leave the company.\n","wordCount":"471","inLanguage":"en","image":"https://yanirseroussi.com/2021/10/07/my-work-with-automattic/bing-yanir-seroussi-automattic-work.webp","datePublished":"2021-10-07T00:00:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2021/10/07/my-work-with-automattic/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">My work with Automattic</h1><div class=post-meta><span title='2021-10-07 00:00:00 +0000 UTC'>October 7, 2021</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2021/10/07/my-work-with-automattic/bing-yanir-seroussi-automattic-work_hu4e71d1d731e036718ec371daa39a901d_40330_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2021/10/07/my-work-with-automattic/bing-yanir-seroussi-automattic-work_hu4e71d1d731e036718ec371daa39a901d_40330_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2021/10/07/my-work-with-automattic/bing-yanir-seroussi-automattic-work.webp 512w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/bing-yanir-seroussi-automattic-work.webp alt="Bing thinks I looked like this while working at Automattic." width=512 height=481><p>Bing thinks I looked like this while working at Automattic.</p></figure><div class=post-content><p><a href=https://automattic.com/ target=_blank rel=noopener>Automattic</a> is the company behind WordPress.com, Tumblr, Jetpack, WooCommerce, and several other products. I worked with Automattic as a <a href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/>Type B Data Scientist</a> (i.e., I mostly built and deployed code to production) from May 2017 to October 2021. This post is back-dated to my last day with the company to make it fit nicely into my post timeline, but I&rsquo;m actually writing this in July 2023. The magic of time travel! 🪄</p><p>A nice perk of working with Automattic was getting to write about my work on company blogs. When my website was on WordPress.com, I used the reblogging feature to share those posts here, but they never looked great. One of the first projects I completed after leaving Automattic was <a href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/>migrating my site from WordPress.com to Hugo</a>, which made the reblog posts look even worse. Now all those reblogs redirect here, thanks <a href=https://gohugo.io/content-management/urls/#aliases target=_blank rel=noopener>to Hugo&rsquo;s aliases feature</a>.</p><p>Anyway, here are some highlights from my Automattic work along with links to the relevant posts:</p><ul><li>Leading the build of a unified experimentation platform and spreading causal inference best practices throughout the organisation:<ul><li><a href=https://data.blog/2021/03/16/explat-automattics-experimentation-platform/ target=_blank rel=noopener>ExPlat: Automattic&rsquo;s Experimentation Platform</a> (by Aaron Yan – Aaron was the team lead, and I was the tech lead for the project)</li><li><a href=https://data.blog/2021/04/14/architecting-explat-automattics-new-experimentation-platform/ target=_blank rel=noopener>Architecting ExPlat: Automattic&rsquo;s New Experimentation Platform</a> (by me)</li><li><a href=https://data.blog/2021/08/06/explats-development-principles-and-practices/ target=_blank rel=noopener>ExPlat&rsquo;s Development Principles and Practices</a> (by me)</li></ul></li><li>Co-developing pipe, a bespoke machine learning pipeline that was mostly used for marketing tasks when I was around (and is apparently still going strong in 2023 and beyond):<ul><li><a href=https://data.blog/2018/11/15/introducing-pipe-the-automattic-machine-learning-pipeline/ target=_blank rel=noopener>Introducing pipe, The Automattic Machine Learning Pipeline</a> (by Demet Dagdelen – pipe started as a two-person project that we worked on together)</li><li><a href=https://data.blog/2019/01/15/how-to-increase-retention-and-revenue-in-1000-nontrivial-steps/ target=_blank rel=noopener>How to Increase Retention and Revenue in 1,000 Nontrivial Steps</a> (by me)</li><li><a href=https://data.blog/2019/01/08/building-thousands-of-reproducible-ml-models-with-pipe-the-automattic-machine-learning-pipeline/ target=_blank rel=noopener>Building Thousands of Reproducible ML Models with pipe, the Automattic Machine Learning Pipeline</a> (by Demet Dagdelen)</li><li><a href=https://data.blog/2019/06/10/using-ml-for-campaign-optimization-our-journey-to-marketing-science-at-automattic/ target=_blank rel=noopener>Using ML for Campaign Optimization: Our Journey to Marketing Science at Automattic</a> (by Demet Dagdelen)</li></ul></li><li>End-to-end implementation of automated customer chat tagging. My colleague Charles Earl published <a href=https://data.blog/2017/05/24/may-the-bot-be-with-you-how-algorithms-are-supporting-happiness-at-wordpress-com/ target=_blank rel=noopener>a post on the initial steps of the project</a> around the time I joined the company. I helped get it to production shortly after I joined in 2017, once I was done with my first project that included improved measurement and presentation of key engagement metrics. In other words, I spent my first few months as an analytics engineer, then a few months as a machine learning engineer (classifications that were new or nonexistent back then).</li><li><a href=https://data.blog/2018/03/20/engineering-data-science-at-automattic/ target=_blank rel=noopener>Encouraging the adoption of engineering best practices in data science projects</a>.</li><li><a href=https://data.blog/2019/05/23/data-science-insights-from-cameron-davidson-pilon/ target=_blank rel=noopener>Hosting Cameron Davidson-Pilon for a chat</a> and running internal book clubs and learning groups.</li><li><a href=https://wordpress.com/blog/2020/09/21/toward-zero-reducing-and-offsetting-our-data-center-power-emissions/ target=_blank rel=noopener>Starting and co-leading an employee resource group to promote sustainability at Automattic, which resulted in carbon offsetting based on my research</a>.</li></ul><p>On this website, you can also read about <a href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/>how I ended up joining Automattic</a> and on <a href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/>some of the reasons behind my decision to leave the company</a>.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/automattic/>Automattic</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/causal-inference/>Causal Inference</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/environment/>Environment</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/remote-work/>Remote Work</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share My work with Automattic on x" href="https://x.com/intent/tweet/?text=My%20work%20with%20Automattic&amp;url=https%3a%2f%2fyanirseroussi.com%2f2021%2f10%2f07%2fmy-work-with-automattic%2f&amp;hashtags=Automattic%2ccareer%2ccausalinference%2cdatascience%2cenvironment%2cmachinelearning%2cmarketing%2cremotework%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My work with Automattic on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2021%2f10%2f07%2fmy-work-with-automattic%2f&amp;title=My%20work%20with%20Automattic&amp;summary=My%20work%20with%20Automattic&amp;source=https%3a%2f%2fyanirseroussi.com%2f2021%2f10%2f07%2fmy-work-with-automattic%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My work with Automattic on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2021%2f10%2f07%2fmy-work-with-automattic%2f&title=My%20work%20with%20Automattic"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My work with Automattic on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2021%2f10%2f07%2fmy-work-with-automattic%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My work with Automattic on whatsapp" href="https://api.whatsapp.com/send?text=My%20work%20with%20Automattic%20-%20https%3a%2f%2fyanirseroussi.com%2f2021%2f10%2f07%2fmy-work-with-automattic%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My work with Automattic on telegram" href="https://telegram.me/share/url?text=My%20work%20with%20Automattic&amp;url=https%3a%2f%2fyanirseroussi.com%2f2021%2f10%2f07%2fmy-work-with-automattic%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My work with Automattic on ycombinator" href="https://news.ycombinator.com/submitlink?t=My%20work%20with%20Automattic&u=https%3a%2f%2fyanirseroussi.com%2f2021%2f10%2f07%2fmy-work-with-automattic%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="Automattic,career,causal inference,data science,environment,machine learning,marketing,remote work,software engineering"><meta name=description content="Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I&rsquo;ve done with the company."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="My work with Automattic"><meta property="og:description" content="Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I&rsquo;ve done with the company."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2021/10/07/my-work-with-automattic/"><meta property="og:image" content="https://yanirseroussi.com/2021/10/07/my-work-with-automattic/bing-yanir-seroussi-automattic-work.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2021-10-07T00:00:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2021/10/07/my-work-with-automattic/bing-yanir-seroussi-automattic-work.webp"><meta name=twitter:title content="My work with Automattic"><meta name=twitter:description content="Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I&rsquo;ve done with the company."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"My work with Automattic","item":"https://yanirseroussi.com/2021/10/07/my-work-with-automattic/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"My work with Automattic","name":"My work with Automattic","description":"Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I\u0026rsquo;ve done with the company.","keywords":["Automattic","career","causal inference","data science","environment","machine learning","marketing","remote work","software engineering"],"articleBody":"Automattic is the company behind WordPress.com, Tumblr, Jetpack, WooCommerce, and several other products. I worked with Automattic as a Type B Data Scientist (i.e., I mostly built and deployed code to production) from May 2017 to October 2021. This post is back-dated to my last day with the company to make it fit nicely into my post timeline, but I’m actually writing this in July 2023. The magic of time travel! 🪄\nA nice perk of working with Automattic was getting to write about my work on company blogs. When my website was on WordPress.com, I used the reblogging feature to share those posts here, but they never looked great. One of the first projects I completed after leaving Automattic was migrating my site from WordPress.com to Hugo, which made the reblog posts look even worse. Now all those reblogs redirect here, thanks to Hugo’s aliases feature.\nAnyway, here are some highlights from my Automattic work along with links to the relevant posts:\nLeading the build of a unified experimentation platform and spreading causal inference best practices throughout the organisation: ExPlat: Automattic’s Experimentation Platform (by Aaron Yan – Aaron was the team lead, and I was the tech lead for the project) Architecting ExPlat: Automattic’s New Experimentation Platform (by me) ExPlat’s Development Principles and Practices (by me) Co-developing pipe, a bespoke machine learning pipeline that was mostly used for marketing tasks when I was around (and is apparently still going strong in 2023 and beyond): Introducing pipe, The Automattic Machine Learning Pipeline (by Demet Dagdelen – pipe started as a two-person project that we worked on together) How to Increase Retention and Revenue in 1,000 Nontrivial Steps (by me) Building Thousands of Reproducible ML Models with pipe, the Automattic Machine Learning Pipeline (by Demet Dagdelen) Using ML for Campaign Optimization: Our Journey to Marketing Science at Automattic (by Demet Dagdelen) End-to-end implementation of automated customer chat tagging. My colleague Charles Earl published a post on the initial steps of the project around the time I joined the company. I helped get it to production shortly after I joined in 2017, once I was done with my first project that included improved measurement and presentation of key engagement metrics. In other words, I spent my first few months as an analytics engineer, then a few months as a machine learning engineer (classifications that were new or nonexistent back then). Encouraging the adoption of engineering best practices in data science projects. Hosting Cameron Davidson-Pilon for a chat and running internal book clubs and learning groups. Starting and co-leading an employee resource group to promote sustainability at Automattic, which resulted in carbon offsetting based on my research. On this website, you can also read about how I ended up joining Automattic and on some of the reasons behind my decision to leave the company.\n","wordCount":"471","inLanguage":"en","image":"https://yanirseroussi.com/2021/10/07/my-work-with-automattic/bing-yanir-seroussi-automattic-work.webp","datePublished":"2021-10-07T00:00:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2021/10/07/my-work-with-automattic/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">My work with Automattic</h1><div class=post-meta><span title='2021-10-07 00:00:00 +0000 UTC'>October 7, 2021</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2021/10/07/my-work-with-automattic/bing-yanir-seroussi-automattic-work_hu4e71d1d731e036718ec371daa39a901d_40330_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2021/10/07/my-work-with-automattic/bing-yanir-seroussi-automattic-work_hu4e71d1d731e036718ec371daa39a901d_40330_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2021/10/07/my-work-with-automattic/bing-yanir-seroussi-automattic-work.webp 512w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/bing-yanir-seroussi-automattic-work.webp alt="Bing thinks I looked like this while working at Automattic." width=512 height=481><p>Bing thinks I looked like this while working at Automattic.</p></figure><div class=post-content><p><a href=https://automattic.com/ target=_blank rel=noopener>Automattic</a> is the company behind WordPress.com, Tumblr, Jetpack, WooCommerce, and several other products. I worked with Automattic as a <a href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/>Type B Data Scientist</a> (i.e., I mostly built and deployed code to production) from May 2017 to October 2021. This post is back-dated to my last day with the company to make it fit nicely into my post timeline, but I&rsquo;m actually writing this in July 2023. The magic of time travel! 🪄</p><p>A nice perk of working with Automattic was getting to write about my work on company blogs. When my website was on WordPress.com, I used the reblogging feature to share those posts here, but they never looked great. One of the first projects I completed after leaving Automattic was <a href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/>migrating my site from WordPress.com to Hugo</a>, which made the reblog posts look even worse. Now all those reblogs redirect here, thanks <a href=https://gohugo.io/content-management/urls/#aliases target=_blank rel=noopener>to Hugo&rsquo;s aliases feature</a>.</p><p>Anyway, here are some highlights from my Automattic work along with links to the relevant posts:</p><ul><li>Leading the build of a unified experimentation platform and spreading causal inference best practices throughout the organisation:<ul><li><a href=https://data.blog/2021/03/16/explat-automattics-experimentation-platform/ target=_blank rel=noopener>ExPlat: Automattic&rsquo;s Experimentation Platform</a> (by Aaron Yan – Aaron was the team lead, and I was the tech lead for the project)</li><li><a href=https://data.blog/2021/04/14/architecting-explat-automattics-new-experimentation-platform/ target=_blank rel=noopener>Architecting ExPlat: Automattic&rsquo;s New Experimentation Platform</a> (by me)</li><li><a href=https://data.blog/2021/08/06/explats-development-principles-and-practices/ target=_blank rel=noopener>ExPlat&rsquo;s Development Principles and Practices</a> (by me)</li></ul></li><li>Co-developing pipe, a bespoke machine learning pipeline that was mostly used for marketing tasks when I was around (and is apparently still going strong in 2023 and beyond):<ul><li><a href=https://data.blog/2018/11/15/introducing-pipe-the-automattic-machine-learning-pipeline/ target=_blank rel=noopener>Introducing pipe, The Automattic Machine Learning Pipeline</a> (by Demet Dagdelen – pipe started as a two-person project that we worked on together)</li><li><a href=https://data.blog/2019/01/15/how-to-increase-retention-and-revenue-in-1000-nontrivial-steps/ target=_blank rel=noopener>How to Increase Retention and Revenue in 1,000 Nontrivial Steps</a> (by me)</li><li><a href=https://data.blog/2019/01/08/building-thousands-of-reproducible-ml-models-with-pipe-the-automattic-machine-learning-pipeline/ target=_blank rel=noopener>Building Thousands of Reproducible ML Models with pipe, the Automattic Machine Learning Pipeline</a> (by Demet Dagdelen)</li><li><a href=https://data.blog/2019/06/10/using-ml-for-campaign-optimization-our-journey-to-marketing-science-at-automattic/ target=_blank rel=noopener>Using ML for Campaign Optimization: Our Journey to Marketing Science at Automattic</a> (by Demet Dagdelen)</li></ul></li><li>End-to-end implementation of automated customer chat tagging. My colleague Charles Earl published <a href=https://data.blog/2017/05/24/may-the-bot-be-with-you-how-algorithms-are-supporting-happiness-at-wordpress-com/ target=_blank rel=noopener>a post on the initial steps of the project</a> around the time I joined the company. I helped get it to production shortly after I joined in 2017, once I was done with my first project that included improved measurement and presentation of key engagement metrics. In other words, I spent my first few months as an analytics engineer, then a few months as a machine learning engineer (classifications that were new or nonexistent back then).</li><li><a href=https://data.blog/2018/03/20/engineering-data-science-at-automattic/ target=_blank rel=noopener>Encouraging the adoption of engineering best practices in data science projects</a>.</li><li><a href=https://data.blog/2019/05/23/data-science-insights-from-cameron-davidson-pilon/ target=_blank rel=noopener>Hosting Cameron Davidson-Pilon for a chat</a> and running internal book clubs and learning groups.</li><li><a href=https://wordpress.com/blog/2020/09/21/toward-zero-reducing-and-offsetting-our-data-center-power-emissions/ target=_blank rel=noopener>Starting and co-leading an employee resource group to promote sustainability at Automattic, which resulted in carbon offsetting based on my research</a>.</li></ul><p>On this website, you can also read about <a href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/>how I ended up joining Automattic</a> and on <a href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/>some of the reasons behind my decision to leave the company</a>.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/automattic/>Automattic</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/causal-inference/>Causal Inference</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/environment/>Environment</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/remote-work/>Remote Work</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share My work with Automattic on x" href="https://x.com/intent/tweet/?text=My%20work%20with%20Automattic&amp;url=https%3a%2f%2fyanirseroussi.com%2f2021%2f10%2f07%2fmy-work-with-automattic%2f&amp;hashtags=Automattic%2ccareer%2ccausalinference%2cdatascience%2cenvironment%2cmachinelearning%2cmarketing%2cremotework%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My work with Automattic on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2021%2f10%2f07%2fmy-work-with-automattic%2f&amp;title=My%20work%20with%20Automattic&amp;summary=My%20work%20with%20Automattic&amp;source=https%3a%2f%2fyanirseroussi.com%2f2021%2f10%2f07%2fmy-work-with-automattic%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My work with Automattic on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2021%2f10%2f07%2fmy-work-with-automattic%2f&title=My%20work%20with%20Automattic"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My work with Automattic on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2021%2f10%2f07%2fmy-work-with-automattic%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My work with Automattic on whatsapp" href="https://api.whatsapp.com/send?text=My%20work%20with%20Automattic%20-%20https%3a%2f%2fyanirseroussi.com%2f2021%2f10%2f07%2fmy-work-with-automattic%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My work with Automattic on telegram" href="https://telegram.me/share/url?text=My%20work%20with%20Automattic&amp;url=https%3a%2f%2fyanirseroussi.com%2f2021%2f10%2f07%2fmy-work-with-automattic%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My work with Automattic on ycombinator" href="https://news.ycombinator.com/submitlink?t=My%20work%20with%20Automattic&u=https%3a%2f%2fyanirseroussi.com%2f2021%2f10%2f07%2fmy-work-with-automattic%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/index.html b/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/index.html
index 1369c9384..39c076b4f 100644
--- a/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/index.html
+++ b/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Migrating from WordPress.com to Hugo on GitHub + Cloudflare | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="Cloudflare,GitHub,Hugo,sustainability,web development,WordPress"><meta name=description content="My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Migrating from WordPress.com to Hugo on GitHub + Cloudflare"><meta property="og:description" content="My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/"><meta property="og:image" content="https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2021-11-10T06:30:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration.jpg"><meta name=twitter:title content="Migrating from WordPress.com to Hugo on GitHub + Cloudflare"><meta name=twitter:description content="My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Migrating from WordPress.com to Hugo on GitHub + Cloudflare","item":"https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Migrating from WordPress.com to Hugo on GitHub + Cloudflare","name":"Migrating from WordPress.com to Hugo on GitHub \u002b Cloudflare","description":"My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.","keywords":["Cloudflare","GitHub","Hugo","sustainability","web development","WordPress"],"articleBody":"Last month, I left Automattic (the company behind WordPress.com) after about 4.5 years of working there as a data scientist. As I am moving back into independent consulting, I decided it was time to give my website a facelift and start posting more often. The biggest part of the facelift was migrating off WordPress.com – I now use Hugo for site generation and GitHub + Cloudflare for hosting. This post summarises my reasons for switching and some technical choices I made, which may be useful for people who are considering a similar migration.\nWhy switch from WordPress.com to Hugo? The easiest short-term choice would have been to stick with WordPress.com and spend more time on publishing new posts and working on other projects. However, if I were to start a new personal site today, it’s unlikely I would choose WordPress.com, i.e., not migrating would have been due to inertia. Given that I had the free time to invest in the migration, it seemed worth doing for the following long-term benefits:\nMore control over the styling, content, and layout. On WordPress.com, I was on the AU$60 / year Personal plan (which I got for free while working with Automattic). Adding custom CSS requires upgrading to the AU$120 / year Premium plan. More advanced customisation via WordPress plugins requires paying for the AU$396 / year Business plan. With Hugo, I have full control over the website’s source code – for free. Indeed, it felt liberating to customise my chosen Hugo theme and eliminate inline styling in old posts (which I previously used to work around the custom CSS limitation on WordPress.com).\nBetter editing experience. Since 2014 and through the years when I published the most content, I used the Classic WordPress editor. Not being a fan of heavy WYSIWYG editors, I used to edit posts in HTML mode and focus on the content with minimal markup. Since December 2018, WordPress has shipped with Gutenberg as the default editor. While it is possible to use the Classic editor as a block within Gutenberg, I find the experience too clunky. And I’m not the only one: As of November 2021, the Gutenberg plugin has an average rating of 2.1 stars (including many recent one-star reviews), and the Gutenberg repository has over 700 open bugs.\nMoving to Hugo means that I’m free to write my posts in Markdown or HTML using any offline text editor, and experience no surprises when my posts are published. As a bonus, you can see the source of this and all other posts on GitHub. Given that plain text files have been around for far longer than WordPress and Gutenberg, the approach of relying on Markdown for this website is likely to continue working well for decades, even if I end up replacing Hugo. And for a bit of fun, going with Markdown means that GitHub Copilot can try to help with suggestions that range from laughable to eerily insightful.\nGitHub Copilot trying to help with this post Platform-independent follower list. On WordPress.com, about half my followers subscribed via email. The other half used the WordPress.com Reader. Reader subscribers can only be ported to other WordPress sites. With the migration, all subscribers join a single mailing list that is easy to port across service providers.\nLower running costs. Hosting a site with a custom domain on GitHub Pages is free, but mapping a custom domain to a WordPress.com site requires payment. While the plan cost is low compared to the time cost of switching, it’s nice to eliminate recurring payments to WordPress.com.\nLearning opportunity. Not being a web developer, I don’t follow changes in the web development world too closely. Taking more ownership and control over my personal site means that I have to refresh some of my knowledge, i.e., the time spent on the migration wasn’t completely wasted on mindless work.\nNaturally, I identified some potential risks: Hugo is younger and more likely to be abandoned by its developers than WordPress, maintenance tasks may end up being too time-consuming, and I might miss some features offered by WordPress.com. Ultimately, I decided that the benefits outweigh the risks, which was just the first in a string of decisions on the journey to move off WordPress.com.\nDecisions, decisions… (or: solution components) WordPress.com is an integrated solution, where many useful features are included even on the Free plan. As such, I would still recommend it for people who are less technically inclined, or to those who aren’t interested in fine control over their website. A good way to appreciate what’s included in WordPress.com is to try to migrate an existing site off the platform. I found the process a bit overwhelming at first, but ultimately I persevered and ended up with the following solution components.\nSite generator: Hugo. The biggest change was in the site generation approach – from the dynamic WordPress to the static Hugo. This switch makes sense for my website: I write a new post every once in a while, and it remains unchanged for years. Hence, the same content gets served tens of thousands of times. Moving to a static site generator obviates the need for a traditional database – my posts are simple Markdown files that Hugo turns to HTML. Together with a bit of CSS and JS, that’s enough to serve the same content forever.\nOf the many static site generation options, I chose Hugo because it seems popular and well-maintained, and because I find its focus on speed attractive. I also like that it’s simple to install and deploy to many hosts, and that its documentation is clear and comprehensive.\nHugo theme: PaperMod. When initially testing Hugo, I went with the Ananke theme from the quick start manual. Then I switched to Beautiful Hugo for its built-in Staticman comment support. When I realised that this support is limited and easy to mimic in other themes, I switched to PaperMod after seeing it on Dan C Williams’s site. PaperMod has a few quirks, but it’s easy to override anything I don’t like (see my tweaks on this site’s repo).\nHost: GitHub Pages. I wanted to avoid opening new accounts where possible, so hosting the site on GitHub Pages was a natural and safe choice: It’s backed by a massive company and has been free to use since 2008. I also like GitHub’s sustainability policy, though this should be the standard – any tech company can and should get to at least net zero this decade.\nDNS, CDN, and more: Cloudflare. I’ve used Cloudflare in the past and was impressed with the range of high-quality services they provide for free or for a low price. Therefore, making Cloudflare the DNS and CDN provider for this site was a no-brainer. I’m also planning to use it for my domain registration, as Cloudflare now provides registrar services at wholesale prices. On the sustainability front, Cloudflare is committed to powering its network with 100% renewables – it’s going in the right direction, but it’s not as clear on Scope 3 emissions as GitHub.\nComments: Static display + GitHub issues. By far, the most annoying part of the migration was settling on a solution for comments. The Hugo docs suggest Disqus as the default, but they also note many other options. With about 150 comments over nearly eight years, this site is hardly a vibrant discussion forum – using Disqus feels like an overkill. After a bit of research, I learned about Staticman, which can be self-hosted to turn every comment into a static YAML file that gets rendered by Hugo. I liked the static generation aspect of the Staticman approach, but I didn’t like the idea of complicating things by running another service. Therefore, I settled on my own comment layout and stylesheet, which includes buttons to add new comments as GitHub issues. For this, I found the posts by Khalid Yasoob and Dan C Williams helpful, though I deviated from their solutions.\nAs I was already moderating comments on my WordPress.com site, I doubt that the additional overhead of manually turning issues into YAML files would be unmanageable. In any case, I can iterate on my solution by adding issue templates and automating the conversion of issues to YAML. Other than the added processing overhead, a downside of my approach in comparison to Staticman is that it requires commenters to have a GitHub account. Given my audience, I think it’s a reasonable requirement, and it should help mitigate spam. In any case, I’m not married to this solution – I can always switch to Staticman, Disqus, or any other commenting system. That’s the beauty of gaining control over my website.\nContact form: Google. I had a WordPress.com contact form on my About page, which I replaced with an embedded Google Form. As Google Forms don’t have built-in spam protection from anonymous users, my form requires users to log in to Google. This limits options for potential contacts, but I’m also contactable via LinkedIn or GitHub. While all these options require an account, they’re free and backed by companies that are serious about fighting spam. And of course, Google is a sustainability leader.\nMailing list: TinyLetter. As TinyLetter has been around for years and is owned by MailChimp, it feels like a safe choice for managing my current email subscriber list (unless it grows beyond TinyLetter’s limits). In any case, porting an email list is easy, as no one owns email. Unfortunately, I’m unsure about TinyLetter’s sustainability, but with Mailchimp’s recent acquisition by Intuit, I hope it will be covered by Intuit’s ambitious sustainability goals.\nAnalytics: Cloudflare. I considered installing Google Analytics, which I didn’t have on my WordPress.com site because it requires a Premium plan. However, I decided against it given the prevalence of Google Analytics blockers (especially among tech-savvy audiences). Taking a bit of my own advice, I asked myself why I needed analytics? The main reasons are: Verifying the site works as expected, and getting a broad idea of where traffic is coming from and which posts are popular. For this, “accurate” view counts are unnecessary, as is close tracking of individuals. Therefore, I went with the lightweight web analytics provided by Cloudflare, which doesn’t collect personal user data. In some respects, it is more limited than the free stats offered by WordPress.com, e.g., Cloudflare’s data retention period is 30 days. But since my focus is operational, I don’t need to retain stats from past months and years – they feel like vanity metrics that won’t change my behaviour.\nSource: Measuring what matters: How to pick a good metric Making the Big Switch The migration process was similar to that described by Yasoob Khalid. Notable changes from Yasoob’s post were excluding the Staticman setup, tweaking the comment conversion script, and importing images into page bundles rather than using the resized images produced by the WordPress-to-Hugo Exporter. Since I had to go post by post to fix various things that broke in the process (e.g., YouTube embeds), I also took the opportunity to manually clean up the image filenames. Once I was happy with the result, I switched the domain mapping on GitHub and Cloudflare, left a note to followers on WordPress.com, and started monitoring traffic via Cloudflare Web Analytics.\nOverall, I’m satisfied with the result. The new layout feels much lighter and less cluttered, but it’s also enriched by features like a dark mode toggle. Lost functionality includes Like buttons, “reblog” options, and the top and bottom menu shown to logged-in WordPress.com users. But these bits feel superfluous – people can still like my posts without a Like button.\nBefore and after look of a recent post As I was eager to finish the initial migration, I avoided spending too much time on non-critical tasks. These include following all the SEO best practices, increasing page speed, applying various style tweaks, and other small changes. With more control over my site, I now have the power to incrementally address such tasks over time.\nIn summary, I found the migration rewarding and educational. It was also fun to go through old posts and get motivated to publish more frequently. I’m looking forward to shifting my focus to the content – stay tuned for new posts!\n","wordCount":"2036","inLanguage":"en","image":"https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration.jpg","datePublished":"2021-11-10T06:30:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Migrating from WordPress.com to Hugo on GitHub + Cloudflare</h1><div class=post-meta><span title='2021-11-10 06:30:00 +0000 UTC'>November 10, 2021</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration_hu6b7664f523075193f9f11d79c1c9dcfa_399617_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration_hu6b7664f523075193f9f11d79c1c9dcfa_399617_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration_hu6b7664f523075193f9f11d79c1c9dcfa_399617_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration_hu6b7664f523075193f9f11d79c1c9dcfa_399617_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration_hu6b7664f523075193f9f11d79c1c9dcfa_399617_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration.jpg 1920w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration.jpg alt width=1920 height=938></figure><div class=post-content><p>Last month, I left Automattic (the company behind WordPress.com) after about 4.5 years of working there as a data scientist. As I am moving back into independent consulting, I decided it was time to give my website a facelift and start posting more often. The biggest part of the facelift was migrating off WordPress.com – I now use <a href=https://gohugo.io/ target=_blank rel=noopener>Hugo</a> for site generation and GitHub + Cloudflare for hosting. This post summarises my reasons for switching and some technical choices I made, which may be useful for people who are considering a similar migration.</p><h2 id=why-switch-from-wordpresscom-to-hugo>Why switch from WordPress.com to Hugo?<a hidden class=anchor aria-hidden=true href=#why-switch-from-wordpresscom-to-hugo>#</a></h2><p>The easiest short-term choice would have been to stick with WordPress.com and spend more time on publishing new posts and working on other projects. However, if I were to start a new personal site today, it&rsquo;s unlikely I would choose WordPress.com, i.e., <em>not</em> migrating would have been due to inertia. Given that I had the free time to invest in the migration, it seemed worth doing for the following long-term benefits:</p><ul><li><p><strong>More control over the styling, content, and layout.</strong> On WordPress.com, I was on the AU$60 / year Personal plan (which I got for free while working with Automattic). Adding custom CSS requires upgrading to the AU$120 / year Premium plan. More advanced customisation via WordPress plugins requires paying for the AU$396 / year Business plan. With Hugo, I have full control over the website&rsquo;s source code – for free. Indeed, it felt liberating to customise my chosen Hugo theme and eliminate inline styling in old posts (which I previously used to work around the custom CSS limitation on WordPress.com).</p></li><li><p><strong>Better editing experience.</strong> Since 2014 and through the years when I published the most content, I used the Classic WordPress editor. Not being a fan of heavy <a href=https://en.wikipedia.org/wiki/WYSIWYG title="what you see is what you get" target=_blank rel=noopener>WYSIWYG</a> editors, I used to edit posts in HTML mode and focus on the content with minimal markup. Since December 2018, WordPress has shipped with Gutenberg as the default editor. While it is possible to use the Classic editor as a block within Gutenberg, I find the experience too clunky. And I&rsquo;m not the only one: As of November 2021, <a href=https://wordpress.org/support/plugin/gutenberg/reviews/ target=_blank rel=noopener>the Gutenberg plugin has an average rating of 2.1 stars</a> (including many recent one-star reviews), and <a href="https://github.com/WordPress/gutenberg/issues?q=is%3Aissue+is%3Aopen+label%3A%22%5BType%5D+Bug%22" target=_blank rel=noopener>the Gutenberg repository has over 700 open bugs</a>.</p><p>Moving to Hugo means that I&rsquo;m free to write my posts in Markdown or HTML using any offline text editor, and experience no surprises when my posts are published. As a bonus, you can see the source of <a href=https://github.com/yanirs/yanirseroussi.com/tree/master/content/posts target=_blank rel=noopener>this and all other posts</a> on GitHub. Given that plain text files have been around for far longer than WordPress and Gutenberg, the approach of relying on Markdown for this website is likely to continue working well for decades, even if I end up replacing Hugo. And for a bit of fun, going with Markdown means that <a href=https://copilot.github.com/ target=_blank rel=noopener>GitHub Copilot</a> can try to help with suggestions that range from laughable to eerily insightful.</p><figure><a href=github-copilot.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="Cloudflare,GitHub,Hugo,sustainability,web development,WordPress"><meta name=description content="My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Migrating from WordPress.com to Hugo on GitHub + Cloudflare"><meta property="og:description" content="My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/"><meta property="og:image" content="https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2021-11-10T06:30:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration.jpg"><meta name=twitter:title content="Migrating from WordPress.com to Hugo on GitHub + Cloudflare"><meta name=twitter:description content="My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Migrating from WordPress.com to Hugo on GitHub + Cloudflare","item":"https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Migrating from WordPress.com to Hugo on GitHub + Cloudflare","name":"Migrating from WordPress.com to Hugo on GitHub \u002b Cloudflare","description":"My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.","keywords":["Cloudflare","GitHub","Hugo","sustainability","web development","WordPress"],"articleBody":"Last month, I left Automattic (the company behind WordPress.com) after about 4.5 years of working there as a data scientist. As I am moving back into independent consulting, I decided it was time to give my website a facelift and start posting more often. The biggest part of the facelift was migrating off WordPress.com – I now use Hugo for site generation and GitHub + Cloudflare for hosting. This post summarises my reasons for switching and some technical choices I made, which may be useful for people who are considering a similar migration.\nWhy switch from WordPress.com to Hugo? The easiest short-term choice would have been to stick with WordPress.com and spend more time on publishing new posts and working on other projects. However, if I were to start a new personal site today, it’s unlikely I would choose WordPress.com, i.e., not migrating would have been due to inertia. Given that I had the free time to invest in the migration, it seemed worth doing for the following long-term benefits:\nMore control over the styling, content, and layout. On WordPress.com, I was on the AU$60 / year Personal plan (which I got for free while working with Automattic). Adding custom CSS requires upgrading to the AU$120 / year Premium plan. More advanced customisation via WordPress plugins requires paying for the AU$396 / year Business plan. With Hugo, I have full control over the website’s source code – for free. Indeed, it felt liberating to customise my chosen Hugo theme and eliminate inline styling in old posts (which I previously used to work around the custom CSS limitation on WordPress.com).\nBetter editing experience. Since 2014 and through the years when I published the most content, I used the Classic WordPress editor. Not being a fan of heavy WYSIWYG editors, I used to edit posts in HTML mode and focus on the content with minimal markup. Since December 2018, WordPress has shipped with Gutenberg as the default editor. While it is possible to use the Classic editor as a block within Gutenberg, I find the experience too clunky. And I’m not the only one: As of November 2021, the Gutenberg plugin has an average rating of 2.1 stars (including many recent one-star reviews), and the Gutenberg repository has over 700 open bugs.\nMoving to Hugo means that I’m free to write my posts in Markdown or HTML using any offline text editor, and experience no surprises when my posts are published. As a bonus, you can see the source of this and all other posts on GitHub. Given that plain text files have been around for far longer than WordPress and Gutenberg, the approach of relying on Markdown for this website is likely to continue working well for decades, even if I end up replacing Hugo. And for a bit of fun, going with Markdown means that GitHub Copilot can try to help with suggestions that range from laughable to eerily insightful.\nGitHub Copilot trying to help with this post Platform-independent follower list. On WordPress.com, about half my followers subscribed via email. The other half used the WordPress.com Reader. Reader subscribers can only be ported to other WordPress sites. With the migration, all subscribers join a single mailing list that is easy to port across service providers.\nLower running costs. Hosting a site with a custom domain on GitHub Pages is free, but mapping a custom domain to a WordPress.com site requires payment. While the plan cost is low compared to the time cost of switching, it’s nice to eliminate recurring payments to WordPress.com.\nLearning opportunity. Not being a web developer, I don’t follow changes in the web development world too closely. Taking more ownership and control over my personal site means that I have to refresh some of my knowledge, i.e., the time spent on the migration wasn’t completely wasted on mindless work.\nNaturally, I identified some potential risks: Hugo is younger and more likely to be abandoned by its developers than WordPress, maintenance tasks may end up being too time-consuming, and I might miss some features offered by WordPress.com. Ultimately, I decided that the benefits outweigh the risks, which was just the first in a string of decisions on the journey to move off WordPress.com.\nDecisions, decisions… (or: solution components) WordPress.com is an integrated solution, where many useful features are included even on the Free plan. As such, I would still recommend it for people who are less technically inclined, or to those who aren’t interested in fine control over their website. A good way to appreciate what’s included in WordPress.com is to try to migrate an existing site off the platform. I found the process a bit overwhelming at first, but ultimately I persevered and ended up with the following solution components.\nSite generator: Hugo. The biggest change was in the site generation approach – from the dynamic WordPress to the static Hugo. This switch makes sense for my website: I write a new post every once in a while, and it remains unchanged for years. Hence, the same content gets served tens of thousands of times. Moving to a static site generator obviates the need for a traditional database – my posts are simple Markdown files that Hugo turns to HTML. Together with a bit of CSS and JS, that’s enough to serve the same content forever.\nOf the many static site generation options, I chose Hugo because it seems popular and well-maintained, and because I find its focus on speed attractive. I also like that it’s simple to install and deploy to many hosts, and that its documentation is clear and comprehensive.\nHugo theme: PaperMod. When initially testing Hugo, I went with the Ananke theme from the quick start manual. Then I switched to Beautiful Hugo for its built-in Staticman comment support. When I realised that this support is limited and easy to mimic in other themes, I switched to PaperMod after seeing it on Dan C Williams’s site. PaperMod has a few quirks, but it’s easy to override anything I don’t like (see my tweaks on this site’s repo).\nHost: GitHub Pages. I wanted to avoid opening new accounts where possible, so hosting the site on GitHub Pages was a natural and safe choice: It’s backed by a massive company and has been free to use since 2008. I also like GitHub’s sustainability policy, though this should be the standard – any tech company can and should get to at least net zero this decade.\nDNS, CDN, and more: Cloudflare. I’ve used Cloudflare in the past and was impressed with the range of high-quality services they provide for free or for a low price. Therefore, making Cloudflare the DNS and CDN provider for this site was a no-brainer. I’m also planning to use it for my domain registration, as Cloudflare now provides registrar services at wholesale prices. On the sustainability front, Cloudflare is committed to powering its network with 100% renewables – it’s going in the right direction, but it’s not as clear on Scope 3 emissions as GitHub.\nComments: Static display + GitHub issues. By far, the most annoying part of the migration was settling on a solution for comments. The Hugo docs suggest Disqus as the default, but they also note many other options. With about 150 comments over nearly eight years, this site is hardly a vibrant discussion forum – using Disqus feels like an overkill. After a bit of research, I learned about Staticman, which can be self-hosted to turn every comment into a static YAML file that gets rendered by Hugo. I liked the static generation aspect of the Staticman approach, but I didn’t like the idea of complicating things by running another service. Therefore, I settled on my own comment layout and stylesheet, which includes buttons to add new comments as GitHub issues. For this, I found the posts by Khalid Yasoob and Dan C Williams helpful, though I deviated from their solutions.\nAs I was already moderating comments on my WordPress.com site, I doubt that the additional overhead of manually turning issues into YAML files would be unmanageable. In any case, I can iterate on my solution by adding issue templates and automating the conversion of issues to YAML. Other than the added processing overhead, a downside of my approach in comparison to Staticman is that it requires commenters to have a GitHub account. Given my audience, I think it’s a reasonable requirement, and it should help mitigate spam. In any case, I’m not married to this solution – I can always switch to Staticman, Disqus, or any other commenting system. That’s the beauty of gaining control over my website.\nContact form: Google. I had a WordPress.com contact form on my About page, which I replaced with an embedded Google Form. As Google Forms don’t have built-in spam protection from anonymous users, my form requires users to log in to Google. This limits options for potential contacts, but I’m also contactable via LinkedIn or GitHub. While all these options require an account, they’re free and backed by companies that are serious about fighting spam. And of course, Google is a sustainability leader.\nMailing list: TinyLetter. As TinyLetter has been around for years and is owned by MailChimp, it feels like a safe choice for managing my current email subscriber list (unless it grows beyond TinyLetter’s limits). In any case, porting an email list is easy, as no one owns email. Unfortunately, I’m unsure about TinyLetter’s sustainability, but with Mailchimp’s recent acquisition by Intuit, I hope it will be covered by Intuit’s ambitious sustainability goals.\nAnalytics: Cloudflare. I considered installing Google Analytics, which I didn’t have on my WordPress.com site because it requires a Premium plan. However, I decided against it given the prevalence of Google Analytics blockers (especially among tech-savvy audiences). Taking a bit of my own advice, I asked myself why I needed analytics? The main reasons are: Verifying the site works as expected, and getting a broad idea of where traffic is coming from and which posts are popular. For this, “accurate” view counts are unnecessary, as is close tracking of individuals. Therefore, I went with the lightweight web analytics provided by Cloudflare, which doesn’t collect personal user data. In some respects, it is more limited than the free stats offered by WordPress.com, e.g., Cloudflare’s data retention period is 30 days. But since my focus is operational, I don’t need to retain stats from past months and years – they feel like vanity metrics that won’t change my behaviour.\nSource: Measuring what matters: How to pick a good metric Making the Big Switch The migration process was similar to that described by Yasoob Khalid. Notable changes from Yasoob’s post were excluding the Staticman setup, tweaking the comment conversion script, and importing images into page bundles rather than using the resized images produced by the WordPress-to-Hugo Exporter. Since I had to go post by post to fix various things that broke in the process (e.g., YouTube embeds), I also took the opportunity to manually clean up the image filenames. Once I was happy with the result, I switched the domain mapping on GitHub and Cloudflare, left a note to followers on WordPress.com, and started monitoring traffic via Cloudflare Web Analytics.\nOverall, I’m satisfied with the result. The new layout feels much lighter and less cluttered, but it’s also enriched by features like a dark mode toggle. Lost functionality includes Like buttons, “reblog” options, and the top and bottom menu shown to logged-in WordPress.com users. But these bits feel superfluous – people can still like my posts without a Like button.\nBefore and after look of a recent post As I was eager to finish the initial migration, I avoided spending too much time on non-critical tasks. These include following all the SEO best practices, increasing page speed, applying various style tweaks, and other small changes. With more control over my site, I now have the power to incrementally address such tasks over time.\nIn summary, I found the migration rewarding and educational. It was also fun to go through old posts and get motivated to publish more frequently. I’m looking forward to shifting my focus to the content – stay tuned for new posts!\n","wordCount":"2036","inLanguage":"en","image":"https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration.jpg","datePublished":"2021-11-10T06:30:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Migrating from WordPress.com to Hugo on GitHub + Cloudflare</h1><div class=post-meta><span title='2021-11-10 06:30:00 +0000 UTC'>November 10, 2021</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration_hu6b7664f523075193f9f11d79c1c9dcfa_399617_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration_hu6b7664f523075193f9f11d79c1c9dcfa_399617_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration_hu6b7664f523075193f9f11d79c1c9dcfa_399617_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration_hu6b7664f523075193f9f11d79c1c9dcfa_399617_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration_hu6b7664f523075193f9f11d79c1c9dcfa_399617_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration.jpg 1920w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bird-migration.jpg alt width=1920 height=938></figure><div class=post-content><p>Last month, I left Automattic (the company behind WordPress.com) after about 4.5 years of working there as a data scientist. As I am moving back into independent consulting, I decided it was time to give my website a facelift and start posting more often. The biggest part of the facelift was migrating off WordPress.com – I now use <a href=https://gohugo.io/ target=_blank rel=noopener>Hugo</a> for site generation and GitHub + Cloudflare for hosting. This post summarises my reasons for switching and some technical choices I made, which may be useful for people who are considering a similar migration.</p><h2 id=why-switch-from-wordpresscom-to-hugo>Why switch from WordPress.com to Hugo?<a hidden class=anchor aria-hidden=true href=#why-switch-from-wordpresscom-to-hugo>#</a></h2><p>The easiest short-term choice would have been to stick with WordPress.com and spend more time on publishing new posts and working on other projects. However, if I were to start a new personal site today, it&rsquo;s unlikely I would choose WordPress.com, i.e., <em>not</em> migrating would have been due to inertia. Given that I had the free time to invest in the migration, it seemed worth doing for the following long-term benefits:</p><ul><li><p><strong>More control over the styling, content, and layout.</strong> On WordPress.com, I was on the AU$60 / year Personal plan (which I got for free while working with Automattic). Adding custom CSS requires upgrading to the AU$120 / year Premium plan. More advanced customisation via WordPress plugins requires paying for the AU$396 / year Business plan. With Hugo, I have full control over the website&rsquo;s source code – for free. Indeed, it felt liberating to customise my chosen Hugo theme and eliminate inline styling in old posts (which I previously used to work around the custom CSS limitation on WordPress.com).</p></li><li><p><strong>Better editing experience.</strong> Since 2014 and through the years when I published the most content, I used the Classic WordPress editor. Not being a fan of heavy <a href=https://en.wikipedia.org/wiki/WYSIWYG title="what you see is what you get" target=_blank rel=noopener>WYSIWYG</a> editors, I used to edit posts in HTML mode and focus on the content with minimal markup. Since December 2018, WordPress has shipped with Gutenberg as the default editor. While it is possible to use the Classic editor as a block within Gutenberg, I find the experience too clunky. And I&rsquo;m not the only one: As of November 2021, <a href=https://wordpress.org/support/plugin/gutenberg/reviews/ target=_blank rel=noopener>the Gutenberg plugin has an average rating of 2.1 stars</a> (including many recent one-star reviews), and <a href="https://github.com/WordPress/gutenberg/issues?q=is%3Aissue+is%3Aopen+label%3A%22%5BType%5D+Bug%22" target=_blank rel=noopener>the Gutenberg repository has over 700 open bugs</a>.</p><p>Moving to Hugo means that I&rsquo;m free to write my posts in Markdown or HTML using any offline text editor, and experience no surprises when my posts are published. As a bonus, you can see the source of <a href=https://github.com/yanirs/yanirseroussi.com/tree/master/content/posts target=_blank rel=noopener>this and all other posts</a> on GitHub. Given that plain text files have been around for far longer than WordPress and Gutenberg, the approach of relying on Markdown for this website is likely to continue working well for decades, even if I end up replacing Hugo. And for a bit of fun, going with Markdown means that <a href=https://copilot.github.com/ target=_blank rel=noopener>GitHub Copilot</a> can try to help with suggestions that range from laughable to eerily insightful.</p><figure><a href=github-copilot.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/github-copilot.png 1652w," src=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/github-copilot_hub3762b96d7343316087ed952f41599f3_192815_800x0_resize_box_3.png alt="GitHub Copilot trying to help with this post" loading=lazy></a><figcaption><p>GitHub Copilot trying to help with this post</p></figcaption></figure></li><li><p><strong>Platform-independent follower list.</strong> On WordPress.com, about half my followers subscribed via email. The other half used the WordPress.com Reader. <a href=https://wordpress.com/support/moving-a-blog/moving-your-subscribers/ target=_blank rel=noopener>Reader subscribers can only be ported to other WordPress sites</a>. With the migration, all subscribers join a single mailing list that is easy to port across service providers.</p></li><li><p><strong>Lower running costs.</strong> Hosting a site with a custom domain on GitHub Pages is free, but mapping a custom domain to a WordPress.com site requires payment. While the plan cost is low compared to the time cost of switching, it&rsquo;s nice to eliminate recurring payments to WordPress.com.</p></li><li><p><strong>Learning opportunity.</strong> Not being a web developer, I don&rsquo;t follow changes in the web development world too closely. Taking more ownership and control over my personal site means that I have to refresh some of my knowledge, i.e., the time spent on the migration wasn&rsquo;t completely wasted on mindless work.</p></li></ul><p>Naturally, I identified some potential risks: Hugo is younger and more likely to be abandoned by its developers than WordPress, maintenance tasks may end up being too time-consuming, and I might miss some features offered by WordPress.com. Ultimately, I decided that the benefits outweigh the risks, which was just the first in a string of decisions on the journey to move off WordPress.com.</p><h2 id=decisions-decisions-or-solution-components>Decisions, decisions&mldr; (or: solution components)<a hidden class=anchor aria-hidden=true href=#decisions-decisions-or-solution-components>#</a></h2><p>WordPress.com is an integrated solution, where many useful features are included even on the Free plan. As such, I would still recommend it for people who are less technically inclined, or to those who aren&rsquo;t interested in fine control over their website. A good way to appreciate what&rsquo;s included in WordPress.com is to try to migrate an existing site off the platform. I found the process a bit overwhelming at first, but ultimately I persevered and ended up with the following solution components.</p><p><strong>Site generator: Hugo.</strong> The biggest change was in the site generation approach – from the dynamic WordPress to the static Hugo. This switch makes sense for my website: I write a new post every once in a while, and it remains unchanged for years. Hence, the same content gets served tens of thousands of times. Moving to a static site generator obviates the need for a traditional database – <a href=https://github.com/yanirs/yanirseroussi.com/tree/master/content/posts target=_blank rel=noopener>my posts are simple Markdown files</a> that Hugo turns to HTML. Together with a bit of CSS and JS, that&rsquo;s enough to serve the same content <em>forever</em>.</p><p>Of <a href=https://jamstack.org/generators/ target=_blank rel=noopener>the many static site generation options</a>, I chose Hugo because it seems popular and well-maintained, and because I find its focus on speed attractive. I also like that it&rsquo;s <a href=https://gohugo.io/getting-started/installing/ target=_blank rel=noopener>simple to install</a> and <a href=https://gohugo.io/hosting-and-deployment/ target=_blank rel=noopener>deploy to many hosts</a>, and that its documentation is clear and comprehensive.</p><p><strong>Hugo theme: PaperMod.</strong> When initially testing Hugo, I went with <a href=https://gohugo.io/getting-started/quick-start/ target=_blank rel=noopener>the Ananke theme from the quick start manual</a>. Then I switched to <a href=https://themes.gohugo.io/themes/beautifulhugo/ target=_blank rel=noopener>Beautiful Hugo</a> for its built-in Staticman comment support. When I realised that this support is limited and easy to mimic in other themes, I switched to <a href=https://github.com/adityatelange/hugo-PaperMod/ target=_blank rel=noopener>PaperMod</a> after seeing it on <a href=https://dancwilliams.com/ target=_blank rel=noopener>Dan C Williams&rsquo;s site</a>. PaperMod has a few quirks, but it&rsquo;s easy to override anything I don&rsquo;t like (see my tweaks <a href=https://github.com/yanirs/yanirseroussi.com target=_blank rel=noopener>on this site&rsquo;s repo</a>).</p><p><strong>Host: GitHub Pages.</strong> I wanted to avoid opening new accounts where possible, so hosting the site on <a href=https://pages.github.com/ target=_blank rel=noopener>GitHub Pages</a> was a natural and safe choice: It&rsquo;s backed by a massive company and has been free to use since 2008. I also like <a href=https://github.blog/2021-04-22-environmental-sustainability-github/ target=_blank rel=noopener>GitHub&rsquo;s sustainability policy</a>, though this should be the standard – <em>any tech company can and should get to at least net zero this decade</em>.</p><p><strong>DNS, CDN, and more: Cloudflare.</strong> I&rsquo;ve used Cloudflare in the past and was impressed with the range of high-quality services they provide for free or for a low price. Therefore, making Cloudflare the DNS and CDN provider for this site was a no-brainer. I&rsquo;m also planning to use it for my domain registration, as <a href=https://blog.cloudflare.com/cloudflare-registrar/ target=_blank rel=noopener>Cloudflare now provides registrar services at wholesale prices</a>. On the sustainability front, <a href=https://blog.cloudflare.com/cloudflare-committed-to-building-a-greener-internet/ target=_blank rel=noopener>Cloudflare is committed to powering its network with 100% renewables</a> – it&rsquo;s going in the right direction, but it&rsquo;s not as clear on <a href=https://ghgprotocol.org/sites/default/files/standards_supporting/FAQ.pdf target=_blank rel=noopener>Scope 3 emissions</a> as GitHub.</p><p><strong>Comments: Static display + GitHub issues.</strong> By far, the most annoying part of the migration was settling on a solution for comments. <a href=https://gohugo.io/content-management/comments/ target=_blank rel=noopener>The Hugo docs suggest Disqus as the default</a>, but they also note many other options. With about 150 comments over nearly eight years, this site is hardly a vibrant discussion forum – using Disqus feels like an overkill. After a bit of research, I learned about <a href=https://staticman.net/ target=_blank rel=noopener>Staticman</a>, which can be self-hosted to turn every comment into a static YAML file that gets rendered by Hugo. I liked the static generation aspect of the Staticman approach, but I didn&rsquo;t like the idea of complicating things by running another service. Therefore, I settled on <a href=https://github.com/yanirs/yanirseroussi.com/blob/master/layouts/partials/comments.html target=_blank rel=noopener>my own comment layout</a> and <a href=https://github.com/yanirs/yanirseroussi.com/blob/master/assets/css/extended/comments.css target=_blank rel=noopener>stylesheet</a>, which includes buttons to add new comments as GitHub issues. For this, I found the posts by <a href=https://yasoob.me/posts/running_staticman_on_static_hugo_blog_with_nested_comments/ target=_blank rel=noopener>Khalid Yasoob</a> and <a href=https://dancwilliams.com/hugo-staticman-nested-replies-and-email-notifications/ target=_blank rel=noopener>Dan C Williams</a> helpful, though I deviated from their solutions.</p><p>As I was already moderating comments on my WordPress.com site, I doubt that the additional overhead of manually turning issues into YAML files would be unmanageable. In any case, I can iterate on my solution by adding issue templates and automating the conversion of issues to YAML. Other than the added processing overhead, a downside of my approach in comparison to Staticman is that it requires commenters to have a GitHub account. Given my audience, I think it&rsquo;s a reasonable requirement, and it should help mitigate spam. In any case, I&rsquo;m not married to this solution – I can always switch to Staticman, Disqus, or any other commenting system. That&rsquo;s the beauty of gaining control over my website.</p><p><strong>Contact form: Google.</strong> I had a WordPress.com contact form on <a href=https://yanirseroussi.com/about/>my About page</a>, which I replaced with an embedded Google Form. As <a href=https://xfanatical.com/blog/3-ways-to-protect-google-forms-from-spamming/ target=_blank rel=noopener>Google Forms don&rsquo;t have built-in spam protection from anonymous users</a>, my form requires users to log in to Google. This limits options for potential contacts, but I&rsquo;m also contactable via LinkedIn or GitHub. While all these options require an account, they&rsquo;re free and backed by companies that are serious about fighting spam. And of course, <a href=https://sustainability.google/ target=_blank rel=noopener>Google is a sustainability leader</a>.</p><p><strong>Mailing list: TinyLetter.</strong> As TinyLetter has been around for years and is owned by MailChimp, it feels like a safe choice for managing my current email subscriber list (unless it grows beyond TinyLetter&rsquo;s limits). In any case, porting an email list is easy, as <a href=https://www.cgpgrey.com/blog/the-professional-sharer target=_blank rel=noopener>no one owns email</a>. Unfortunately, I&rsquo;m unsure about TinyLetter&rsquo;s sustainability, but with <a href=https://techcrunch.com/2021/09/13/intuit-confirms-12b-deal-to-buy-mailchimp/ target=_blank rel=noopener>Mailchimp&rsquo;s recent acquisition by Intuit</a>, I hope it will be covered by <a href=https://www.intuit.com/company/corporate-responsibility/climate/ target=_blank rel=noopener>Intuit&rsquo;s ambitious sustainability goals</a>.</p><p><strong>Analytics: Cloudflare.</strong> I considered installing Google Analytics, which I didn&rsquo;t have on my WordPress.com site because it requires a Premium plan. However, I decided against it given the prevalence of Google Analytics blockers (<a href=https://plausible.io/blog/google-analytics-adblockers-missing-data target=_blank rel=noopener>especially among tech-savvy audiences</a>). Taking a bit of <a href=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/>my own advice</a>, I asked myself <em>why</em> I needed analytics? The main reasons are: Verifying the site works as expected, and getting a broad idea of where traffic is coming from and which posts are popular. For this, &ldquo;accurate&rdquo; view counts are unnecessary, as is close tracking of individuals. Therefore, I went with <a href=https://developers.cloudflare.com/analytics/web-analytics target=_blank rel=noopener>the lightweight web analytics provided by Cloudflare</a>, which doesn&rsquo;t collect personal user data. In some respects, it is more limited than the free stats offered by WordPress.com, e.g., Cloudflare&rsquo;s data retention period is 30 days. But since my focus is operational, I don&rsquo;t need to retain stats from past months and years – they feel like vanity metrics that won&rsquo;t change my behaviour.</p><figure><a href=bad-metric.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bad-metric_hu35a9560a7e4074b1b75b8e55f467b6b1_377557_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/bad-metric_hu35a9560a7e4074b1b75b8e55f467b6b1_377557_480x0_resize_box_3.png 480w,
diff --git a/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/index.html b/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/index.html
index 116ac01a5..7655388b5 100644
--- a/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/index.html
+++ b/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Use your human brain to avoid artificial intelligence disasters | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,data science,deep learning,ethics,fast.ai,machine learning"><meta name=description content="Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Use your human brain to avoid artificial intelligence disasters"><meta property="og:description" content="Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/"><meta property="og:image" content="https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/think-about-your-modelling-context.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2021-11-22T03:45:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/think-about-your-modelling-context.png"><meta name=twitter:title content="Use your human brain to avoid artificial intelligence disasters"><meta name=twitter:description content="Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Use your human brain to avoid artificial intelligence disasters","item":"https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Use your human brain to avoid artificial intelligence disasters","name":"Use your human brain to avoid artificial intelligence disasters","description":"Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.","keywords":["artificial intelligence","data science","deep learning","ethics","fast.ai","machine learning"],"articleBody":"Earlier this year, I helped mentor a local edition of fast.ai’s Practical Deep Learning for Coders. Each mentor gave a brief talk on a given week’s subject, adding to the material covered in the recorded lectures. My talk (embedded below) supplemented the data ethics lesson. While the mere mention of the word ethics can elicit instant yawns from some people, the main message for me is that it’s critical for humans to think about the context and consequences of deploying machine learning models.\nUnfortunately, this message sometimes gets muddied amidst the outrage about specific applications that conflict with the values of the outraged parties. But I believe it’s possible to transcend narrow moralities and agree that better outcomes arise when humans think deeply about their deep learning systems. Or to put it more bluntly, any fool can build machine learning models, but it takes thoughtful humans to build good artificial intelligence applications.\nSource: Three Panel Soul - dog philosophy Of course, what constitutes good is an open question, which I touched on in the talk. Other key points include:\nThe modelling context is much broader than any machine learning model. Considering context is where human brains shine. Thoughtlessness can have a negative impact on society and on your career. Moral values vary across time, space, cultures, and individuals, e.g., along five moral foundations. Any data scientist, machine learning engineer, or modern human should develop their critical thinking skills. The Calling Bullshit course from the University of Washington is a great starting point – essentially Data Literacy 101. Bullshit is easier to detect than call. Deciding on a level of bullshit calling is like tuning a model’s learning rate. A good chunk of the talk was spent on the case study on criminal machine learning from the Calling Bullshit website. I was pleased with the level of engagement on this segment, especially since a lockdown forced us to deliver the class online at short notice. You can watch the full talk below (my part ends after 24 minutes), view the slides here, and check out supplementary materials from all mentors on GitHub.\n","wordCount":"351","inLanguage":"en","image":"https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/think-about-your-modelling-context.png","datePublished":"2021-11-22T03:45:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Use your human brain to avoid artificial intelligence disasters</h1><div class=post-meta><span title='2021-11-22 03:45:00 +0000 UTC'>November 22, 2021</span></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/think-about-your-modelling-context.png alt="If you don't think about your modelling context, you're gonna have a bad time."></figure><div class=post-content><p>Earlier this year, I helped mentor a local edition of <a href=https://course.fast.ai/ target=_blank rel=noopener>fast.ai&rsquo;s <em>Practical Deep Learning for Coders</em></a>. Each mentor gave a brief talk on a given week&rsquo;s subject, adding to the material covered in the recorded lectures. My talk (embedded below) supplemented <a href="https://www.youtube.com/watch?v=krIVOb23EH8" target=_blank rel=noopener>the data ethics lesson</a>. While the mere mention of the word <em>ethics</em> can elicit instant yawns from some people, the main message for me is that <strong>it&rsquo;s critical for humans to think about the context and consequences of deploying machine learning models</strong>.</p><p>Unfortunately, this message sometimes gets muddied amidst the outrage about specific applications that conflict with the values of the outraged parties. But I believe it&rsquo;s possible to transcend narrow moralities and agree that better outcomes arise when humans think deeply about their deep learning systems. Or to put it more bluntly, <strong>any fool can build machine learning models, but it takes thoughtful humans to build <em>good</em> artificial intelligence applications.</strong></p><figure><a href=dog-philosophy.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="artificial intelligence,data science,deep learning,ethics,fast.ai,machine learning"><meta name=description content="Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Use your human brain to avoid artificial intelligence disasters"><meta property="og:description" content="Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/"><meta property="og:image" content="https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/think-about-your-modelling-context.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2021-11-22T03:45:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/think-about-your-modelling-context.png"><meta name=twitter:title content="Use your human brain to avoid artificial intelligence disasters"><meta name=twitter:description content="Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Use your human brain to avoid artificial intelligence disasters","item":"https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Use your human brain to avoid artificial intelligence disasters","name":"Use your human brain to avoid artificial intelligence disasters","description":"Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.","keywords":["artificial intelligence","data science","deep learning","ethics","fast.ai","machine learning"],"articleBody":"Earlier this year, I helped mentor a local edition of fast.ai’s Practical Deep Learning for Coders. Each mentor gave a brief talk on a given week’s subject, adding to the material covered in the recorded lectures. My talk (embedded below) supplemented the data ethics lesson. While the mere mention of the word ethics can elicit instant yawns from some people, the main message for me is that it’s critical for humans to think about the context and consequences of deploying machine learning models.\nUnfortunately, this message sometimes gets muddied amidst the outrage about specific applications that conflict with the values of the outraged parties. But I believe it’s possible to transcend narrow moralities and agree that better outcomes arise when humans think deeply about their deep learning systems. Or to put it more bluntly, any fool can build machine learning models, but it takes thoughtful humans to build good artificial intelligence applications.\nSource: Three Panel Soul - dog philosophy Of course, what constitutes good is an open question, which I touched on in the talk. Other key points include:\nThe modelling context is much broader than any machine learning model. Considering context is where human brains shine. Thoughtlessness can have a negative impact on society and on your career. Moral values vary across time, space, cultures, and individuals, e.g., along five moral foundations. Any data scientist, machine learning engineer, or modern human should develop their critical thinking skills. The Calling Bullshit course from the University of Washington is a great starting point – essentially Data Literacy 101. Bullshit is easier to detect than call. Deciding on a level of bullshit calling is like tuning a model’s learning rate. A good chunk of the talk was spent on the case study on criminal machine learning from the Calling Bullshit website. I was pleased with the level of engagement on this segment, especially since a lockdown forced us to deliver the class online at short notice. You can watch the full talk below (my part ends after 24 minutes), view the slides here, and check out supplementary materials from all mentors on GitHub.\n","wordCount":"351","inLanguage":"en","image":"https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/think-about-your-modelling-context.png","datePublished":"2021-11-22T03:45:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Use your human brain to avoid artificial intelligence disasters</h1><div class=post-meta><span title='2021-11-22 03:45:00 +0000 UTC'>November 22, 2021</span></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/think-about-your-modelling-context.png alt="If you don't think about your modelling context, you're gonna have a bad time."></figure><div class=post-content><p>Earlier this year, I helped mentor a local edition of <a href=https://course.fast.ai/ target=_blank rel=noopener>fast.ai&rsquo;s <em>Practical Deep Learning for Coders</em></a>. Each mentor gave a brief talk on a given week&rsquo;s subject, adding to the material covered in the recorded lectures. My talk (embedded below) supplemented <a href="https://www.youtube.com/watch?v=krIVOb23EH8" target=_blank rel=noopener>the data ethics lesson</a>. While the mere mention of the word <em>ethics</em> can elicit instant yawns from some people, the main message for me is that <strong>it&rsquo;s critical for humans to think about the context and consequences of deploying machine learning models</strong>.</p><p>Unfortunately, this message sometimes gets muddied amidst the outrage about specific applications that conflict with the values of the outraged parties. But I believe it&rsquo;s possible to transcend narrow moralities and agree that better outcomes arise when humans think deeply about their deep learning systems. Or to put it more bluntly, <strong>any fool can build machine learning models, but it takes thoughtful humans to build <em>good</em> artificial intelligence applications.</strong></p><figure><a href=dog-philosophy.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/dog-philosophy_huc3cf4931803219559fd375fea6b748c2_70831_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/dog-philosophy_huc3cf4931803219559fd375fea6b748c2_70831_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/dog-philosophy_huc3cf4931803219559fd375fea6b748c2_70831_720x0_resize_box_3.png 720w,
diff --git a/2022/01/14/analysis-strategies-in-online-a-b-experiments/index.html b/2022/01/14/analysis-strategies-in-online-a-b-experiments/index.html
index 1a5ce0575..23f7df5a3 100644
--- a/2022/01/14/analysis-strategies-in-online-a-b-experiments/index.html
+++ b/2022/01/14/analysis-strategies-in-online-a-b-experiments/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="causal inference,data science,marketing,split testing,statistics"><meta name=description content="Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials"><meta property="og:description" content="Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/"><meta property="og:image" content="https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2022-01-14T00:05:40+00:00"><meta property="article:modified_time" content="2024-02-21T11:52:55+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment.jpg"><meta name=twitter:title content="Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials"><meta name=twitter:description content="Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials","item":"https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials","name":"Analysis strategies in online A\/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials","description":"Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.","keywords":["causal inference","data science","marketing","split testing","statistics"],"articleBody":" In theory, there is no difference between theory and practice. In practice, there is.\nBenjamin Brewster Many discussions of online A/B experiments deal with the sunny day scenario: You randomly assign users to groups A and B, expose group A to the control variant and group B to the treatment variant, run statistical tests on your chosen metrics, and assume that metric differences between the groups that aren’t explained by randomness are due to exposure to the treatment.\nHowever, it’s not always a sunny day for the online experimenter. Challenges include dealing with bot traffic and malicious users, and implementation realities that may make users experience both variants or neither of them. While many of these problems have parallels in clinical trials, I haven’t found many resources that explore these parallels. In this post, I share some lessons I learned from the rich clinical trial literature while building Automattic’s experimentation platform, focusing on analysis strategies that deal with deviations from the ideal experiment scenario.\nReminder: Why we run A/B experiments Uncontrolled versus controlled experiment While the practice of running online A/B experiments is now commonplace, it’s worth reflecting on why such experiments work. Why can’t we just roll out any treatments we think of, measure the metric changes, and assume that differences beyond what we expect from random variation are due to the genius (or folly) of our implemented treatments?\nWell, it’s not that simple because the world isn’t static. Even if we don’t make any changes, we’re likely to see different outcomes from month to month and day to day, as the world and our user population change. This is represented by the top part of the diagram above: While we’re interested in the causal impact of the Treatment on the Outcome, many Unknowns may affect both. That is, without an A/B experiment, the Unknowns act as confounders that make it impossible to estimate the causal effect without further assumptions.\nWith an ideal A/B experiment, we make exposure to the Treatment depend only on our randomisation mechanism – the Assigner on the bottom part of the diagram. Assuming everything goes to plan, we end up with two distinct groups for which exposure to the Treatment is only due to our randomisation mechanism. This allows us to conclude that any differences in the Outcome across the groups beyond what’s expected from randomness are due to the Treatment.\nHowever, reality is often different from this ideal scenario.\nRunning example To make things more concrete, let’s take a simple example: You run a crypto exchange, and you want to maximise signups from one of your landing pages. The current call-to-action text is “sign up”. You’re wondering whether changing it to “sign up today!” would instill a sense of urgency and increase the signup conversion rate (signups divided by unique visitors).\nsign up OR sign up today! A simplified mockup of the variants. Which one would you choose?\nPlacing this scenario into the above diagram, if we were to simply change the text, i.e., apply the Treatment to everyone, we wouldn’t be able to confidently tell whether the text change was the cause of any observed difference in the conversion rate. For example, if our release coincided with a surge of interest in cryptocurrency, this surge may be one of the Unknowns that would cause more motivated users to come to our exchange and sign up. That is, the surge would affect both exposure to the Treatment and the Outcome.\nWhen we run an ideal A/B experiment, we don’t have this problem. Factors like a surge of interest in crypto don’t affect the assignment of users to the control group A (“sign up”) and the treatment group B (“sign up today!”). We can compare the conversion rates across the groups, estimate random variability with our favourite A/B testing calculator, and rejoice. Right?\nWell, not so fast…\nProblems, problems… In the ideal scenario, all the users that were assigned to one of the experiment groups experience their assigned variant and produce a measurable outcome. In our running example, the groups are A: control and B: treatment with a simple exposure of seeing “sign up” for the former and “sign up today!” for the latter. The outcome is a successful signup or an absence of a signup. To make the outcome well-defined, it’s often a good idea to limit outcome measurement to events that happen (or don’t happen) within a reasonable attribution window from exposure or assignment. In our example, a reasonable attribution window is probably on the order of hours, as we don’t expect the call-to-action text to have long-lasting effects.\nPotential deviations from the ideal scenario include:\nAssignment of ineligible users. In our running example, these may be bots or users that already have an account. If we include many ineligible users in our analysis, we may underestimate the effect size even if their distribution across groups is uniform. Crossovers. These are users that manage to experience both variants. For example, they may come across our site on mobile with the “sign up today!” text, and then switch to desktop and see the “sign up” message. Depending on the instrumentation we have in place, we may not be able to detect such users, or we may only detect them if they sign up on one device and then log in on the other device. Assignment without exposure. Due to implementation constraints, we may not be guaranteed that assigned users are actually exposed to the treatment and control. In our running example, it may be that the assignment is done on the backend while exposure happens conditionally and asynchronously on the frontend – some users may bounce in the gap between assignment and exposure, and never see the call-to-action text. Multiple exposures. Once a user has been assigned, they may get exposed to the treatment and control multiple times (without crossing over). In our example, they may visit the landing page repeatedly and see the “sign up” or “sign up today!” text multiple times before deciding to sign up. Epidemiologist jargon and analysis strategies While clinical trials are more tightly controlled than online A/B experiments, they are also susceptible to problems like assignment of ineligible patients and non-adherence to treatment (e.g., crossover, non-exposure, and multiple exposures). Hence, much has been written on addressing these problems at the analysis stage. However, when researching the topic, overcoming the domain-specific language barrier was a bit of a challenge, as the terminology used by online experimenters is different from the terminology used by epidemiologists. Fortunately, I came across the term intention-to-treat at some point, which opened the door to decades of research on the topic.\nTwo papers I found useful are Intention-to-treat concept: A review (Gupta, 2011) and Guidelines for estimating causal effects in pragmatic randomized trials (Murray, Swanson, and Hernán, 2019). Seeing Miguel Hernán on the author list was an especially positive signal for me, as he is responsible for some of my favourite resources on causal inference, including the most practical book I’ve read on the topic.\nThe definitions and guidelines from these two papers provide a solid foundation for thinking about problems of ineligibility and non-adherence. Specifically, Gupta defines intention-to-treat as an analysis strategy “that includes all randomized patients in the groups to which they were randomly assigned, regardless of their adherence with the entry criteria, regardless of the treatment they actually received, and regardless of subsequent withdrawal from treatment or deviation from the protocol.”\nThere are often good reasons to exclude some randomised participants from analysis. Depending on the exclusions, this may or may not bias the results. The use of conservative exclusions can be described as modified intention-to-treat, which according to Gupta “allows the exclusion of some randomized subjects in a justified way (such as patients who were deemed ineligible after randomization or certain patients who never started treatment). However, the definition given to the modified ITT (mITT) in randomized controlled trials has been found to be irregular and arbitrary because there is a lack of consistent guidelines for its application. The mITT analysis allows a subjective approach in entry criteria, which may lead to confusion, inaccurate results and bias.”\nExclusions and further adjustments are usually an attempt to estimate the per-protocol effect, which is defined by Murray, Swanson, and Hernán as “the effect of receiving the assigned treatment strategies throughout the follow-up as specified in the study protocol.” Unfortunately, obtaining a valid estimate of the per-protocol effect isn’t trivial: “To validly estimate the per-protocol effect, baseline variables which predict adherence and are prognostic for the outcome need to be accounted for, either through direct adjustment or via an instrumental variable analysis. Yet two commonly used analytic approaches do not incorporate any such adjustment: (1) Naïve per-protocol analysis, that is, restricting the analytic subset to adherent individuals; and (2) As-treated analysis, that is, comparing individuals based on the treatment they choose.” In other words, if we’re not careful, the per-protocol analysis may become analogous to an uncontrolled experiment, as depicted at the top of the diagram above.\nWhat should be done in practice? From my reading of the clinical trial literature, the tendency is to use multiple analysis strategies. For example, the first guideline noted by Murray, Swanson, and Hernán is: “To adequately guide decision making by all stakeholders, report estimates of both the intention-to-treat effect and the per-protocol effect, as well as methods and key conditions underlying the estimation procedures.” This echoes the 1988 US FDA guidelines that require applicants to provide an intention-to-treat analysis in addition to the applicant’s preferred per-protocol analyses. Similarly, the 1998 European Medicines Agency guidelines provide more details on the intention-to-treat, modified intention-to-treat, and per-protocol strategies, stating that: “In general, it is advantageous to demonstrate a lack of sensitivity of the principal trial results to alternative choices of the set of subjects analysed. […] When the full analysis set and the per protocol set lead to essentially the same conclusions, confidence in the trial results is increased, bearing in mind, however, that the need to exclude a substantial proportion of subjects from the per protocol analysis throws some doubt on the overall validity of the trial.”\nWhile the stakes in online experiments are typically much lower than in human drug approval, I believe that applying multiple analysis strategies is still a great idea. We did that for Automattic’s experimentation platform, where we flagged discrepancies between the strategies if they led to conflicting conclusions. One downside of this approach is that it complicates the presentation of results in comparison to using a single strategy. If you face the same challenge, you may draw inspiration from seeing how it’s addressed by the open source frontend of Automattic’s experimentation platform.\nGoing back to our running example, we can perform the following analyses to deal with the deviations noted above:\nIntention-to-treat. Includes all users based on their initial group assignment, regardless of what variant they were exposed to. Modified intention-to-treat: No ineligible users. This applies to cases where we detect the ineligibility after assignment, but the eligibility criteria are based on factors that could have been known before the experiment. Hence, it should be safe to exclude the ineligible users after the fact. In our example, excluding bots and existing users should increase the observed effect size, but not change the preferred variant. Modified intention-to-treat: No crossovers. If we have a mechanism to detect some crossovers, excluding them and comparing the results to the intention-to-treat analysis may uncover implementation bugs. It’s worth noting that crossovers shouldn’t occur in cases where we can uniquely identify users at all stages of the experiment – it is a problem that is more likely to occur when dealing with anonymous users, as in our landing page example. As such, and given the inability to detect all crossovers, A/B experiments should be avoided when users are highly motivated to cross over. For example, displaying different price levels based on anonymous and transient identifiers like cookies is often a bad idea. Naive per-protocol: Exposed users. For this analysis, we’d only include users that were exposed to the control and treatment texts. As noted by Murray, Swanson, and Hernán, this is naive because we should adjust our estimates based on variables that predict exposure. However, if missing exposures are only due to the inherent limitations of online experiments, this falls more under the modified intention-to-treat criterion noted by Gupta, of excluding “patients who never started treatment”. Things get more complicated if we wish to use each exposure as a distinct starting point for measuring multiple assignment windows (the multiple exposures scenario above), which is akin to patients choosing their own dosage – far from a controlled experiment. For automated analysis, it’s better to use the first exposure as the attribution window start, as it should be unaffected by the experiment variants. For all analysis approaches, it’s critical to verify that there is no sample ratio mismatch in the analysed population, i.e., that the distribution of users across variants matches what we expect from a random assignment. If this isn’t the case, manual analysis by a qualified data scientist is needed. The result of this manual analysis may be that the results should be discarded, as sample ratio mismatches are a common indicator of implementation bugs. This is discussed in detail in the book Trustworthy Online Controlled Experiments, which also includes a chapter on exposure-based analysis (called triggering in the book). Among other recommendations, the authors suggest analysing the unexposed users. If everything goes as expected, metrics for the assigned-but-unexposed populations would behave like A/A experiment metrics, i.e., any differences between the groups should be due to random variability.\nHaving rigorous consistency checks in place and falling back to manual analysis when any discrepancies are detected should help avoid the pitfalls of unsafe user exclusions that’d bias the results. Given the need for careful adjustments to get a valid per-protocol estimate in case anything goes wrong, it is often best to fix any underlying issues and rerun the experiment. Usually, this is much cheaper to do in an online setting than in clinical trials.\nClosing thoughts and further reading Once you move from the theory of experimentation to the practice of running experiments in the real world, you discover the many complexities involved in doing it well. This applies whether you’re an epidemiologist or an online experimenter. As noted in the preface to the trustworthy experiments book: “Getting numbers is easy; getting numbers you can trust is hard!”\nThis post only scratched the surface of one area of experimentation: Deciding what population to analyse once the experiment was run. There is, of course, a lot more to online experimentation and causal inference than what I could cover here. But I hope that this message is clear: Approach experimentation with humility, and aim to learn from a broad set of teachers rather than limit yourself to the relatively-recent developments in online experiments.\nAs mentioned above, some resources that are worth reading to learn more include my favourite causal inference book, the trustworthy experiments book, and the guidelines for pragmatic trials. There are also a bunch of resources on my causal inference list, and my post on Bayesian A/B testing should be of interest if you made it to this point. Finally, I’m always happy to discuss these topics, so feel free to contact me or leave a comment with your thoughts.\nCover image by Tumisu from Pixabay\n","wordCount":"2567","inLanguage":"en","image":"https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment.jpg","datePublished":"2022-01-14T00:05:40Z","dateModified":"2024-02-21T11:52:55+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials</h1><div class=post-meta><span title='2022-01-14 00:05:40 +0000 UTC'>January 14, 2022</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment_hu383e813c0f222ce1ae47728263e063c0_165739_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment_hu383e813c0f222ce1ae47728263e063c0_165739_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment_hu383e813c0f222ce1ae47728263e063c0_165739_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment_hu383e813c0f222ce1ae47728263e063c0_165739_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment_hu383e813c0f222ce1ae47728263e063c0_165739_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment.jpg 1920w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment.jpg alt width=1920 height=1152></figure><div class=post-content><blockquote><p>In theory, there is no difference between theory and practice. In practice, there is.</p><footer><strong>Benjamin Brewster</strong></footer></blockquote><p>Many discussions of online A/B experiments deal with the sunny day scenario: You randomly assign users to groups A and B, expose group A to the control variant and group B to the treatment variant, run statistical tests on your chosen metrics, and assume that metric differences between the groups that aren&rsquo;t explained by randomness are due to exposure to the treatment.</p><p>However, it&rsquo;s not always a sunny day for the online experimenter. Challenges include dealing with bot traffic and malicious users, and implementation realities that may make users experience both variants or neither of them. While many of these problems have parallels in clinical trials, I haven&rsquo;t found many resources that explore these parallels. In this post, I share some lessons I learned from the rich clinical trial literature while building <a href=https://data.blog/category/experimentation-platform/ target=_blank rel=noopener>Automattic&rsquo;s experimentation platform</a>, focusing on analysis strategies that deal with deviations from the ideal experiment scenario.</p><h2 id=reminder-why-we-run-ab-experiments>Reminder: Why we run A/B experiments<a hidden class=anchor aria-hidden=true href=#reminder-why-we-run-ab-experiments>#</a></h2><figure><a href=uncontrolled-versus-controlled-experiment.svg target=_blank rel=noopener><img src=uncontrolled-versus-controlled-experiment.svg alt="Uncontrolled versus controlled experiment" loading=lazy></a><figcaption><p>Uncontrolled versus controlled experiment</p></figcaption></figure><p>While the practice of running online A/B experiments is now commonplace, it&rsquo;s worth reflecting on why such experiments work. Why can&rsquo;t we just roll out any treatments we think of, measure the metric changes, and assume that differences beyond what we expect from random variation are due to the genius (or folly) of our implemented treatments?</p><p>Well, it&rsquo;s not that simple because the world isn&rsquo;t static. Even if we don&rsquo;t make any changes, <a href=https://www.linkedin.com/pulse/how-identify-your-marketing-lies-start-telling-truth-tiberio-caetano/ target=_blank rel=noopener>we&rsquo;re likely to see different outcomes from month to month and day to day</a>, as the world and our user population change. This is represented by the top part of the diagram above: While we&rsquo;re interested in the causal impact of the <code>Treatment</code> on the <code>Outcome</code>, many <code>Unknowns</code> may affect both. That is, without an A/B experiment, the <code>Unknowns</code> act as confounders that make it impossible to estimate the causal effect without <a href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/>further assumptions</a>.</p><p>With an ideal A/B experiment, we make exposure to the <code>Treatment</code> depend only on our randomisation mechanism – the <code>Assigner</code> on the bottom part of the diagram. Assuming everything goes to plan, we end up with two distinct groups for which exposure to the <code>Treatment</code> is only due to our randomisation mechanism. This allows us to conclude that any differences in the <code>Outcome</code> across the groups beyond what&rsquo;s expected from randomness are due to the <code>Treatment</code>.</p><p>However, reality is often different from this ideal scenario.</p><h2 id=running-example>Running example<a hidden class=anchor aria-hidden=true href=#running-example>#</a></h2><p>To make things more concrete, let&rsquo;s take a simple example: You run a crypto exchange, and you want to maximise signups from one of your landing pages. The current call-to-action text is <em>&ldquo;sign up&rdquo;</em>. You&rsquo;re wondering whether changing it to <em>&ldquo;sign up today!&rdquo;</em> would instill a sense of urgency and increase the signup conversion rate (signups divided by unique visitors).</p><figure><a class=comment-button href=# onclick='alert("This is variant A: control")' style=float:unset>sign up</a>
+<meta name=keywords content="causal inference,data science,marketing,split testing,statistics"><meta name=description content="Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials"><meta property="og:description" content="Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/"><meta property="og:image" content="https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2022-01-14T00:05:40+00:00"><meta property="article:modified_time" content="2024-02-21T11:52:55+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment.jpg"><meta name=twitter:title content="Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials"><meta name=twitter:description content="Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials","item":"https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials","name":"Analysis strategies in online A\/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials","description":"Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.","keywords":["causal inference","data science","marketing","split testing","statistics"],"articleBody":" In theory, there is no difference between theory and practice. In practice, there is.\nBenjamin Brewster Many discussions of online A/B experiments deal with the sunny day scenario: You randomly assign users to groups A and B, expose group A to the control variant and group B to the treatment variant, run statistical tests on your chosen metrics, and assume that metric differences between the groups that aren’t explained by randomness are due to exposure to the treatment.\nHowever, it’s not always a sunny day for the online experimenter. Challenges include dealing with bot traffic and malicious users, and implementation realities that may make users experience both variants or neither of them. While many of these problems have parallels in clinical trials, I haven’t found many resources that explore these parallels. In this post, I share some lessons I learned from the rich clinical trial literature while building Automattic’s experimentation platform, focusing on analysis strategies that deal with deviations from the ideal experiment scenario.\nReminder: Why we run A/B experiments Uncontrolled versus controlled experiment While the practice of running online A/B experiments is now commonplace, it’s worth reflecting on why such experiments work. Why can’t we just roll out any treatments we think of, measure the metric changes, and assume that differences beyond what we expect from random variation are due to the genius (or folly) of our implemented treatments?\nWell, it’s not that simple because the world isn’t static. Even if we don’t make any changes, we’re likely to see different outcomes from month to month and day to day, as the world and our user population change. This is represented by the top part of the diagram above: While we’re interested in the causal impact of the Treatment on the Outcome, many Unknowns may affect both. That is, without an A/B experiment, the Unknowns act as confounders that make it impossible to estimate the causal effect without further assumptions.\nWith an ideal A/B experiment, we make exposure to the Treatment depend only on our randomisation mechanism – the Assigner on the bottom part of the diagram. Assuming everything goes to plan, we end up with two distinct groups for which exposure to the Treatment is only due to our randomisation mechanism. This allows us to conclude that any differences in the Outcome across the groups beyond what’s expected from randomness are due to the Treatment.\nHowever, reality is often different from this ideal scenario.\nRunning example To make things more concrete, let’s take a simple example: You run a crypto exchange, and you want to maximise signups from one of your landing pages. The current call-to-action text is “sign up”. You’re wondering whether changing it to “sign up today!” would instill a sense of urgency and increase the signup conversion rate (signups divided by unique visitors).\nsign up OR sign up today! A simplified mockup of the variants. Which one would you choose?\nPlacing this scenario into the above diagram, if we were to simply change the text, i.e., apply the Treatment to everyone, we wouldn’t be able to confidently tell whether the text change was the cause of any observed difference in the conversion rate. For example, if our release coincided with a surge of interest in cryptocurrency, this surge may be one of the Unknowns that would cause more motivated users to come to our exchange and sign up. That is, the surge would affect both exposure to the Treatment and the Outcome.\nWhen we run an ideal A/B experiment, we don’t have this problem. Factors like a surge of interest in crypto don’t affect the assignment of users to the control group A (“sign up”) and the treatment group B (“sign up today!”). We can compare the conversion rates across the groups, estimate random variability with our favourite A/B testing calculator, and rejoice. Right?\nWell, not so fast…\nProblems, problems… In the ideal scenario, all the users that were assigned to one of the experiment groups experience their assigned variant and produce a measurable outcome. In our running example, the groups are A: control and B: treatment with a simple exposure of seeing “sign up” for the former and “sign up today!” for the latter. The outcome is a successful signup or an absence of a signup. To make the outcome well-defined, it’s often a good idea to limit outcome measurement to events that happen (or don’t happen) within a reasonable attribution window from exposure or assignment. In our example, a reasonable attribution window is probably on the order of hours, as we don’t expect the call-to-action text to have long-lasting effects.\nPotential deviations from the ideal scenario include:\nAssignment of ineligible users. In our running example, these may be bots or users that already have an account. If we include many ineligible users in our analysis, we may underestimate the effect size even if their distribution across groups is uniform. Crossovers. These are users that manage to experience both variants. For example, they may come across our site on mobile with the “sign up today!” text, and then switch to desktop and see the “sign up” message. Depending on the instrumentation we have in place, we may not be able to detect such users, or we may only detect them if they sign up on one device and then log in on the other device. Assignment without exposure. Due to implementation constraints, we may not be guaranteed that assigned users are actually exposed to the treatment and control. In our running example, it may be that the assignment is done on the backend while exposure happens conditionally and asynchronously on the frontend – some users may bounce in the gap between assignment and exposure, and never see the call-to-action text. Multiple exposures. Once a user has been assigned, they may get exposed to the treatment and control multiple times (without crossing over). In our example, they may visit the landing page repeatedly and see the “sign up” or “sign up today!” text multiple times before deciding to sign up. Epidemiologist jargon and analysis strategies While clinical trials are more tightly controlled than online A/B experiments, they are also susceptible to problems like assignment of ineligible patients and non-adherence to treatment (e.g., crossover, non-exposure, and multiple exposures). Hence, much has been written on addressing these problems at the analysis stage. However, when researching the topic, overcoming the domain-specific language barrier was a bit of a challenge, as the terminology used by online experimenters is different from the terminology used by epidemiologists. Fortunately, I came across the term intention-to-treat at some point, which opened the door to decades of research on the topic.\nTwo papers I found useful are Intention-to-treat concept: A review (Gupta, 2011) and Guidelines for estimating causal effects in pragmatic randomized trials (Murray, Swanson, and Hernán, 2019). Seeing Miguel Hernán on the author list was an especially positive signal for me, as he is responsible for some of my favourite resources on causal inference, including the most practical book I’ve read on the topic.\nThe definitions and guidelines from these two papers provide a solid foundation for thinking about problems of ineligibility and non-adherence. Specifically, Gupta defines intention-to-treat as an analysis strategy “that includes all randomized patients in the groups to which they were randomly assigned, regardless of their adherence with the entry criteria, regardless of the treatment they actually received, and regardless of subsequent withdrawal from treatment or deviation from the protocol.”\nThere are often good reasons to exclude some randomised participants from analysis. Depending on the exclusions, this may or may not bias the results. The use of conservative exclusions can be described as modified intention-to-treat, which according to Gupta “allows the exclusion of some randomized subjects in a justified way (such as patients who were deemed ineligible after randomization or certain patients who never started treatment). However, the definition given to the modified ITT (mITT) in randomized controlled trials has been found to be irregular and arbitrary because there is a lack of consistent guidelines for its application. The mITT analysis allows a subjective approach in entry criteria, which may lead to confusion, inaccurate results and bias.”\nExclusions and further adjustments are usually an attempt to estimate the per-protocol effect, which is defined by Murray, Swanson, and Hernán as “the effect of receiving the assigned treatment strategies throughout the follow-up as specified in the study protocol.” Unfortunately, obtaining a valid estimate of the per-protocol effect isn’t trivial: “To validly estimate the per-protocol effect, baseline variables which predict adherence and are prognostic for the outcome need to be accounted for, either through direct adjustment or via an instrumental variable analysis. Yet two commonly used analytic approaches do not incorporate any such adjustment: (1) Naïve per-protocol analysis, that is, restricting the analytic subset to adherent individuals; and (2) As-treated analysis, that is, comparing individuals based on the treatment they choose.” In other words, if we’re not careful, the per-protocol analysis may become analogous to an uncontrolled experiment, as depicted at the top of the diagram above.\nWhat should be done in practice? From my reading of the clinical trial literature, the tendency is to use multiple analysis strategies. For example, the first guideline noted by Murray, Swanson, and Hernán is: “To adequately guide decision making by all stakeholders, report estimates of both the intention-to-treat effect and the per-protocol effect, as well as methods and key conditions underlying the estimation procedures.” This echoes the 1988 US FDA guidelines that require applicants to provide an intention-to-treat analysis in addition to the applicant’s preferred per-protocol analyses. Similarly, the 1998 European Medicines Agency guidelines provide more details on the intention-to-treat, modified intention-to-treat, and per-protocol strategies, stating that: “In general, it is advantageous to demonstrate a lack of sensitivity of the principal trial results to alternative choices of the set of subjects analysed. […] When the full analysis set and the per protocol set lead to essentially the same conclusions, confidence in the trial results is increased, bearing in mind, however, that the need to exclude a substantial proportion of subjects from the per protocol analysis throws some doubt on the overall validity of the trial.”\nWhile the stakes in online experiments are typically much lower than in human drug approval, I believe that applying multiple analysis strategies is still a great idea. We did that for Automattic’s experimentation platform, where we flagged discrepancies between the strategies if they led to conflicting conclusions. One downside of this approach is that it complicates the presentation of results in comparison to using a single strategy. If you face the same challenge, you may draw inspiration from seeing how it’s addressed by the open source frontend of Automattic’s experimentation platform.\nGoing back to our running example, we can perform the following analyses to deal with the deviations noted above:\nIntention-to-treat. Includes all users based on their initial group assignment, regardless of what variant they were exposed to. Modified intention-to-treat: No ineligible users. This applies to cases where we detect the ineligibility after assignment, but the eligibility criteria are based on factors that could have been known before the experiment. Hence, it should be safe to exclude the ineligible users after the fact. In our example, excluding bots and existing users should increase the observed effect size, but not change the preferred variant. Modified intention-to-treat: No crossovers. If we have a mechanism to detect some crossovers, excluding them and comparing the results to the intention-to-treat analysis may uncover implementation bugs. It’s worth noting that crossovers shouldn’t occur in cases where we can uniquely identify users at all stages of the experiment – it is a problem that is more likely to occur when dealing with anonymous users, as in our landing page example. As such, and given the inability to detect all crossovers, A/B experiments should be avoided when users are highly motivated to cross over. For example, displaying different price levels based on anonymous and transient identifiers like cookies is often a bad idea. Naive per-protocol: Exposed users. For this analysis, we’d only include users that were exposed to the control and treatment texts. As noted by Murray, Swanson, and Hernán, this is naive because we should adjust our estimates based on variables that predict exposure. However, if missing exposures are only due to the inherent limitations of online experiments, this falls more under the modified intention-to-treat criterion noted by Gupta, of excluding “patients who never started treatment”. Things get more complicated if we wish to use each exposure as a distinct starting point for measuring multiple assignment windows (the multiple exposures scenario above), which is akin to patients choosing their own dosage – far from a controlled experiment. For automated analysis, it’s better to use the first exposure as the attribution window start, as it should be unaffected by the experiment variants. For all analysis approaches, it’s critical to verify that there is no sample ratio mismatch in the analysed population, i.e., that the distribution of users across variants matches what we expect from a random assignment. If this isn’t the case, manual analysis by a qualified data scientist is needed. The result of this manual analysis may be that the results should be discarded, as sample ratio mismatches are a common indicator of implementation bugs. This is discussed in detail in the book Trustworthy Online Controlled Experiments, which also includes a chapter on exposure-based analysis (called triggering in the book). Among other recommendations, the authors suggest analysing the unexposed users. If everything goes as expected, metrics for the assigned-but-unexposed populations would behave like A/A experiment metrics, i.e., any differences between the groups should be due to random variability.\nHaving rigorous consistency checks in place and falling back to manual analysis when any discrepancies are detected should help avoid the pitfalls of unsafe user exclusions that’d bias the results. Given the need for careful adjustments to get a valid per-protocol estimate in case anything goes wrong, it is often best to fix any underlying issues and rerun the experiment. Usually, this is much cheaper to do in an online setting than in clinical trials.\nClosing thoughts and further reading Once you move from the theory of experimentation to the practice of running experiments in the real world, you discover the many complexities involved in doing it well. This applies whether you’re an epidemiologist or an online experimenter. As noted in the preface to the trustworthy experiments book: “Getting numbers is easy; getting numbers you can trust is hard!”\nThis post only scratched the surface of one area of experimentation: Deciding what population to analyse once the experiment was run. There is, of course, a lot more to online experimentation and causal inference than what I could cover here. But I hope that this message is clear: Approach experimentation with humility, and aim to learn from a broad set of teachers rather than limit yourself to the relatively-recent developments in online experiments.\nAs mentioned above, some resources that are worth reading to learn more include my favourite causal inference book, the trustworthy experiments book, and the guidelines for pragmatic trials. There are also a bunch of resources on my causal inference list, and my post on Bayesian A/B testing should be of interest if you made it to this point. Finally, I’m always happy to discuss these topics, so feel free to contact me or leave a comment with your thoughts.\nCover image by Tumisu from Pixabay\n","wordCount":"2567","inLanguage":"en","image":"https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment.jpg","datePublished":"2022-01-14T00:05:40Z","dateModified":"2024-02-21T11:52:55+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials</h1><div class=post-meta><span title='2022-01-14 00:05:40 +0000 UTC'>January 14, 2022</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment_hu383e813c0f222ce1ae47728263e063c0_165739_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment_hu383e813c0f222ce1ae47728263e063c0_165739_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment_hu383e813c0f222ce1ae47728263e063c0_165739_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment_hu383e813c0f222ce1ae47728263e063c0_165739_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment_hu383e813c0f222ce1ae47728263e063c0_165739_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment.jpg 1920w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/online-drug-experiment.jpg alt width=1920 height=1152></figure><div class=post-content><blockquote><p>In theory, there is no difference between theory and practice. In practice, there is.</p><footer><strong>Benjamin Brewster</strong></footer></blockquote><p>Many discussions of online A/B experiments deal with the sunny day scenario: You randomly assign users to groups A and B, expose group A to the control variant and group B to the treatment variant, run statistical tests on your chosen metrics, and assume that metric differences between the groups that aren&rsquo;t explained by randomness are due to exposure to the treatment.</p><p>However, it&rsquo;s not always a sunny day for the online experimenter. Challenges include dealing with bot traffic and malicious users, and implementation realities that may make users experience both variants or neither of them. While many of these problems have parallels in clinical trials, I haven&rsquo;t found many resources that explore these parallels. In this post, I share some lessons I learned from the rich clinical trial literature while building <a href=https://data.blog/category/experimentation-platform/ target=_blank rel=noopener>Automattic&rsquo;s experimentation platform</a>, focusing on analysis strategies that deal with deviations from the ideal experiment scenario.</p><h2 id=reminder-why-we-run-ab-experiments>Reminder: Why we run A/B experiments<a hidden class=anchor aria-hidden=true href=#reminder-why-we-run-ab-experiments>#</a></h2><figure><a href=uncontrolled-versus-controlled-experiment.svg target=_blank rel=noopener><img src=uncontrolled-versus-controlled-experiment.svg alt="Uncontrolled versus controlled experiment" loading=lazy></a><figcaption><p>Uncontrolled versus controlled experiment</p></figcaption></figure><p>While the practice of running online A/B experiments is now commonplace, it&rsquo;s worth reflecting on why such experiments work. Why can&rsquo;t we just roll out any treatments we think of, measure the metric changes, and assume that differences beyond what we expect from random variation are due to the genius (or folly) of our implemented treatments?</p><p>Well, it&rsquo;s not that simple because the world isn&rsquo;t static. Even if we don&rsquo;t make any changes, <a href=https://www.linkedin.com/pulse/how-identify-your-marketing-lies-start-telling-truth-tiberio-caetano/ target=_blank rel=noopener>we&rsquo;re likely to see different outcomes from month to month and day to day</a>, as the world and our user population change. This is represented by the top part of the diagram above: While we&rsquo;re interested in the causal impact of the <code>Treatment</code> on the <code>Outcome</code>, many <code>Unknowns</code> may affect both. That is, without an A/B experiment, the <code>Unknowns</code> act as confounders that make it impossible to estimate the causal effect without <a href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/>further assumptions</a>.</p><p>With an ideal A/B experiment, we make exposure to the <code>Treatment</code> depend only on our randomisation mechanism – the <code>Assigner</code> on the bottom part of the diagram. Assuming everything goes to plan, we end up with two distinct groups for which exposure to the <code>Treatment</code> is only due to our randomisation mechanism. This allows us to conclude that any differences in the <code>Outcome</code> across the groups beyond what&rsquo;s expected from randomness are due to the <code>Treatment</code>.</p><p>However, reality is often different from this ideal scenario.</p><h2 id=running-example>Running example<a hidden class=anchor aria-hidden=true href=#running-example>#</a></h2><p>To make things more concrete, let&rsquo;s take a simple example: You run a crypto exchange, and you want to maximise signups from one of your landing pages. The current call-to-action text is <em>&ldquo;sign up&rdquo;</em>. You&rsquo;re wondering whether changing it to <em>&ldquo;sign up today!&rdquo;</em> would instill a sense of urgency and increase the signup conversion rate (signups divided by unique visitors).</p><figure><a class=comment-button href=# onclick='alert("This is variant A: control")' style=float:unset>sign up</a>
 <small>OR</small>
 <a class=comment-button href=# onclick='alert("This is variant B: treatment")' style=float:unset>sign up today!</a><figcaption><p>A simplified mockup of the variants. Which one would you choose?</p></figcaption></figure><p>Placing this scenario into the above diagram, if we were to simply change the text, i.e., apply the <code>Treatment</code> to everyone, we wouldn&rsquo;t be able to confidently tell whether the text change was the <em>cause</em> of any observed difference in the conversion rate. For example, if our release coincided with a surge of interest in cryptocurrency, this surge may be one of the <code>Unknowns</code> that would cause more motivated users to come to our exchange and sign up. That is, the surge would affect both exposure to the <code>Treatment</code> and the <code>Outcome</code>.</p><p>When we run an ideal A/B experiment, we don&rsquo;t have this problem. Factors like a surge of interest in crypto don&rsquo;t affect the assignment of users to the control group A (<em>&ldquo;sign up&rdquo;</em>) and the treatment group B (<em>&ldquo;sign up today!&rdquo;</em>). We can compare the conversion rates across the groups, estimate random variability with <a href=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/>our favourite A/B testing calculator</a>, and rejoice. Right?</p><p>Well, not so fast&mldr;</p><h2 id=problems-problems>Problems, problems&mldr;<a hidden class=anchor aria-hidden=true href=#problems-problems>#</a></h2><p>In the ideal scenario, all the users that were assigned to one of the experiment groups experience their assigned variant and produce a measurable outcome. In our running example, the groups are <code>A: control</code> and <code>B: treatment</code> with a simple exposure of seeing <em>&ldquo;sign up&rdquo;</em> for the former and <em>&ldquo;sign up today!&rdquo;</em> for the latter. The outcome is a successful signup or an absence of a signup. To make the outcome well-defined, it&rsquo;s often a good idea to limit outcome measurement to events that happen (or don&rsquo;t happen) within a reasonable <em>attribution window</em> from exposure or assignment. In our example, a reasonable attribution window is probably on the order of hours, as we don&rsquo;t expect the call-to-action text to have long-lasting effects.</p><p>Potential deviations from the ideal scenario include:</p><ul><li><strong>Assignment of ineligible users.</strong> In our running example, these may be bots or users that already have an account. If we include many ineligible users in our analysis, we may underestimate the effect size even if their distribution across groups is uniform.</li><li><strong>Crossovers.</strong> These are users that manage to experience both variants. For example, they may come across our site on mobile with the <em>&ldquo;sign up today!&rdquo;</em> text, and then switch to desktop and see the <em>&ldquo;sign up&rdquo;</em> message. Depending on the instrumentation we have in place, we may not be able to detect such users, or we may only detect them if they sign up on one device and then log in on the other device.</li><li><strong>Assignment without exposure.</strong> Due to implementation constraints, we may not be guaranteed that assigned users are actually exposed to the treatment and control. In our running example, it may be that the assignment is done on the backend while exposure happens conditionally and asynchronously on the frontend – some users may bounce in the gap between assignment and exposure, and never see the call-to-action text.</li><li><strong>Multiple exposures.</strong> Once a user has been assigned, they may get exposed to the treatment and control multiple times (without crossing over). In our example, they may visit the landing page repeatedly and see the <em>&ldquo;sign up&rdquo;</em> or <em>&ldquo;sign up today!&rdquo;</em> text multiple times before deciding to sign up.</li></ul><h2 id=epidemiologist-jargon-and-analysis-strategies>Epidemiologist jargon and analysis strategies<a hidden class=anchor aria-hidden=true href=#epidemiologist-jargon-and-analysis-strategies>#</a></h2><p>While clinical trials are more tightly controlled than online A/B experiments, they are also susceptible to problems like assignment of ineligible patients and non-adherence to treatment (e.g., crossover, non-exposure, and multiple exposures). Hence, much has been written on addressing these problems at the analysis stage. However, when researching the topic, overcoming the domain-specific language barrier was a bit of a challenge, as the terminology used by online experimenters is different from the terminology used by epidemiologists. Fortunately, I came across the term <em>intention-to-treat</em> at some point, which opened the door to decades of research on the topic.</p><p>Two papers I found useful are <a href=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3159210/ target=_blank rel=noopener><em>Intention-to-treat concept: A review</em></a> (Gupta, 2011) and <a href=https://arxiv.org/abs/1911.06030 target=_blank rel=noopener><em>Guidelines for estimating causal effects in pragmatic randomized trials</em></a> (Murray, Swanson, and Hernán, 2019). Seeing <a href=https://www.hsph.harvard.edu/miguel-hernan/ target=_blank rel=noopener>Miguel Hernán</a> on the author list was an especially positive signal for me, as he is responsible for <a href=https://yanirseroussi.com/causal-inference-resources/>some of my favourite resources on causal inference</a>, including <a href=https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/>the most practical book I&rsquo;ve read on the topic</a>.</p><p>The definitions and guidelines from these two papers provide a solid foundation for thinking about problems of ineligibility and non-adherence. Specifically, Gupta defines intention-to-treat as an analysis strategy <em>&ldquo;that includes all randomized patients in the groups to which they were randomly assigned, regardless of their adherence with the entry criteria, regardless of the treatment they actually received, and regardless of subsequent withdrawal from treatment or deviation from the protocol.&rdquo;</em></p><p>There are often good reasons to exclude some randomised participants from analysis. Depending on the exclusions, this may or may not bias the results. The use of conservative exclusions can be described as modified intention-to-treat, which according to Gupta <em>&ldquo;allows the exclusion of some randomized subjects in a justified way (such as patients who were deemed ineligible after randomization or certain patients who never started treatment). However, the definition given to the modified ITT (mITT) in randomized controlled trials has been found to be irregular and arbitrary because there is a lack of consistent guidelines for its application. The mITT analysis allows a subjective approach in entry criteria, which may lead to confusion, inaccurate results and bias.&rdquo;</em></p><p>Exclusions and further adjustments are usually an attempt to estimate the per-protocol effect, which is defined by Murray, Swanson, and Hernán as <em>&ldquo;the effect of receiving the assigned treatment strategies throughout the follow-up as specified in the study protocol.&rdquo;</em> Unfortunately, obtaining a valid estimate of the per-protocol effect isn&rsquo;t trivial: <em>&ldquo;To validly estimate the per-protocol effect, baseline variables which predict adherence and are prognostic for the outcome need to be accounted for, either through direct adjustment or via an instrumental variable analysis. Yet two commonly used analytic approaches do not incorporate any such adjustment: (1) Naïve per-protocol analysis, that is, restricting the analytic subset to adherent individuals; and (2) As-treated analysis, that is, comparing individuals based on the treatment they choose.&rdquo;</em> In other words, if we&rsquo;re not careful, the per-protocol analysis may become analogous to an uncontrolled experiment, as depicted at the top of the diagram above.</p><h2 id=what-should-be-done-in-practice>What should be done in practice?<a hidden class=anchor aria-hidden=true href=#what-should-be-done-in-practice>#</a></h2><p>From my reading of the clinical trial literature, the tendency is to use multiple analysis strategies. For example, the first guideline noted by Murray, Swanson, and Hernán is: <em>&ldquo;To adequately guide decision making by all stakeholders, report estimates of both the intention-to-treat effect and the per-protocol effect, as well as methods and key conditions underlying the estimation procedures.&rdquo;</em> This echoes <a href=https://www.fda.gov/regulatory-information/search-fda-guidance-documents/format-and-content-clinical-and-statistical-sections-application target=_blank rel=noopener>the 1988 US FDA guidelines</a> that require applicants to provide an intention-to-treat analysis in addition to the applicant&rsquo;s preferred per-protocol analyses. Similarly, <a href=https://www.ema.europa.eu/en/documents/scientific-guideline/ich-e-9-statistical-principles-clinical-trials-step-5_en.pdf target=_blank rel=noopener>the 1998 European Medicines Agency guidelines</a> provide more details on the intention-to-treat, modified intention-to-treat, and per-protocol strategies, stating that: <em>&ldquo;In general, it is advantageous to demonstrate a lack of sensitivity of the principal trial results to alternative choices of the set of subjects analysed. [&mldr;] When the full analysis set and the per protocol set lead to essentially the same conclusions, confidence in the trial results is increased, bearing in mind, however, that the need to exclude a substantial proportion of subjects from the per protocol analysis throws some doubt on the overall validity of the trial.&rdquo;</em></p><p>While the stakes in online experiments are typically much lower than in human drug approval, I believe that applying multiple analysis strategies is still a great idea. We did that for Automattic&rsquo;s experimentation platform, where we flagged discrepancies between the strategies if they led to conflicting conclusions. One downside of this approach is that it complicates the presentation of results in comparison to using a single strategy. If you face the same challenge, you may draw inspiration from seeing how it&rsquo;s addressed by the <a href=https://github.com/Automattic/abacus target=_blank rel=noopener>open source frontend of Automattic&rsquo;s experimentation platform</a>.</p><p>Going back to our running example, we can perform the following analyses to deal with the deviations noted above:</p><ul><li><strong>Intention-to-treat.</strong> Includes all users based on their initial group assignment, regardless of what variant they were exposed to.</li><li><strong>Modified intention-to-treat: No ineligible users.</strong> This applies to cases where we detect the ineligibility after assignment, but the eligibility criteria are based on factors that could have been known before the experiment. Hence, it <em>should</em> be safe to exclude the ineligible users after the fact. In our example, excluding bots and existing users should increase the observed effect size, but not change the preferred variant.</li><li><strong>Modified intention-to-treat: No crossovers.</strong> If we have a mechanism to detect <em>some</em> crossovers, excluding them and comparing the results to the intention-to-treat analysis may uncover implementation bugs. It&rsquo;s worth noting that crossovers shouldn&rsquo;t occur in cases where we can uniquely identify users at all stages of the experiment – it is a problem that is more likely to occur when dealing with anonymous users, as in our landing page example. As such, and given the inability to detect all crossovers, A/B experiments should be avoided when users are highly motivated to cross over. For example, displaying different price levels based on anonymous and transient identifiers like cookies is often a bad idea.</li><li><strong>Naive per-protocol: Exposed users.</strong> For this analysis, we&rsquo;d only include users that were exposed to the control and treatment texts. As noted by Murray, Swanson, and Hernán, this is naive because we <em>should</em> adjust our estimates based on variables that predict exposure. However, if missing exposures are only due to the inherent limitations of online experiments, this falls more under the modified intention-to-treat criterion noted by Gupta, of excluding <em>&ldquo;patients who never started treatment&rdquo;</em>. Things get more complicated if we wish to use each exposure as a distinct starting point for measuring multiple assignment windows (the <em>multiple exposures</em> scenario above), which is akin to patients choosing their own dosage – far from a controlled experiment. For automated analysis, it&rsquo;s better to use the first exposure as the attribution window start, as it should be unaffected by the experiment variants.</li></ul><p>For all analysis approaches, it&rsquo;s critical to verify that there is no <em>sample ratio mismatch</em> in the analysed population, i.e., that the distribution of users across variants matches what we expect from a random assignment. If this isn&rsquo;t the case, manual analysis by a qualified data scientist is needed. The result of this manual analysis may be that the results should be discarded, as sample ratio mismatches are a common indicator of implementation bugs. This is discussed in detail in the book <a href=https://experimentguide.com/ target=_blank rel=noopener><em>Trustworthy Online Controlled Experiments</em></a>, which also includes a chapter on exposure-based analysis (called <em>triggering</em> in the book). Among other recommendations, the authors suggest analysing the <em>unexposed</em> users. If everything goes as expected, metrics for the assigned-but-unexposed populations would behave like A/A experiment metrics, i.e., any differences between the groups should be due to random variability.</p><p>Having rigorous consistency checks in place and falling back to manual analysis when any discrepancies are detected should help avoid the pitfalls of unsafe user exclusions that&rsquo;d bias the results. Given the need for careful adjustments to get a valid per-protocol estimate in case anything goes wrong, it is often best to fix any underlying issues and rerun the experiment. Usually, this is much cheaper to do in an online setting than in clinical trials.</p><h2 id=closing-thoughts-and-further-reading>Closing thoughts and further reading<a hidden class=anchor aria-hidden=true href=#closing-thoughts-and-further-reading>#</a></h2><p>Once you move from the theory of experimentation to the practice of running experiments in the real world, you discover the many complexities involved in doing it well. This applies whether you&rsquo;re an epidemiologist or an online experimenter. As noted in the preface to the trustworthy experiments book: <em>&ldquo;Getting numbers is easy; getting numbers you can trust is hard!&rdquo;</em></p><p>This post only scratched the surface of one area of experimentation: Deciding what population to analyse once the experiment was run. There is, of course, a lot more to online experimentation and causal inference than what I could cover here. But I hope that this message is clear: <strong>Approach experimentation with humility, and aim to learn from a broad set of teachers rather than limit yourself to the relatively-recent developments in online experiments.</strong></p><p>As mentioned above, some resources that are worth reading to learn more include <a href=https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ target=_blank rel=noopener>my favourite causal inference book</a>, <a href=https://experimentguide.com/ target=_blank rel=noopener>the trustworthy experiments book</a>, and <a href=https://arxiv.org/abs/1911.06030 target=_blank rel=noopener>the guidelines for pragmatic trials</a>. There are also a bunch of resources on <a href=https://yanirseroussi.com/causal-inference-resources/>my causal inference list</a>, and <a href=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/>my post on Bayesian A/B testing</a> should be of interest if you made it to this point. Finally, I&rsquo;m always happy to discuss these topics, so feel free to <a href=https://yanirseroussi.com/about/>contact me</a> or leave a comment with your thoughts.</p><hr><p><small>Cover image by <a href=https://pixabay.com/photos/online-pharmacy-pills-click-3962209/ target=_blank rel=noopener>Tumisu from Pixabay</a></small></p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/causal-inference/>Causal Inference</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/split-testing/>Split Testing</a></li><li><a href=https://yanirseroussi.com/tags/statistics/>Statistics</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials on x" href="https://x.com/intent/tweet/?text=Analysis%20strategies%20in%20online%20A%2fB%20experiments%3a%20Intention-to-treat%2c%20per-protocol%2c%20and%20other%20lessons%20from%20clinical%20trials&amp;url=https%3a%2f%2fyanirseroussi.com%2f2022%2f01%2f14%2fanalysis-strategies-in-online-a-b-experiments%2f&amp;hashtags=causalinference%2cdatascience%2cmarketing%2csplittesting%2cstatistics"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2022%2f01%2f14%2fanalysis-strategies-in-online-a-b-experiments%2f&amp;title=Analysis%20strategies%20in%20online%20A%2fB%20experiments%3a%20Intention-to-treat%2c%20per-protocol%2c%20and%20other%20lessons%20from%20clinical%20trials&amp;summary=Analysis%20strategies%20in%20online%20A%2fB%20experiments%3a%20Intention-to-treat%2c%20per-protocol%2c%20and%20other%20lessons%20from%20clinical%20trials&amp;source=https%3a%2f%2fyanirseroussi.com%2f2022%2f01%2f14%2fanalysis-strategies-in-online-a-b-experiments%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2022%2f01%2f14%2fanalysis-strategies-in-online-a-b-experiments%2f&title=Analysis%20strategies%20in%20online%20A%2fB%20experiments%3a%20Intention-to-treat%2c%20per-protocol%2c%20and%20other%20lessons%20from%20clinical%20trials"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2022%2f01%2f14%2fanalysis-strategies-in-online-a-b-experiments%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials on whatsapp" href="https://api.whatsapp.com/send?text=Analysis%20strategies%20in%20online%20A%2fB%20experiments%3a%20Intention-to-treat%2c%20per-protocol%2c%20and%20other%20lessons%20from%20clinical%20trials%20-%20https%3a%2f%2fyanirseroussi.com%2f2022%2f01%2f14%2fanalysis-strategies-in-online-a-b-experiments%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials on telegram" href="https://telegram.me/share/url?text=Analysis%20strategies%20in%20online%20A%2fB%20experiments%3a%20Intention-to-treat%2c%20per-protocol%2c%20and%20other%20lessons%20from%20clinical%20trials&amp;url=https%3a%2f%2fyanirseroussi.com%2f2022%2f01%2f14%2fanalysis-strategies-in-online-a-b-experiments%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials on ycombinator" href="https://news.ycombinator.com/submitlink?t=Analysis%20strategies%20in%20online%20A%2fB%20experiments%3a%20Intention-to-treat%2c%20per-protocol%2c%20and%20other%20lessons%20from%20clinical%20trials&u=https%3a%2f%2fyanirseroussi.com%2f2022%2f01%2f14%2fanalysis-strategies-in-online-a-b-experiments%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
diff --git a/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/index.html b/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/index.html
index 929d92e7d..0fa169479 100644
--- a/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/index.html
+++ b/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Building useful machine learning tools keeps getting easier: A fish ID case study | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,data science,deep learning,fast.ai,machine learning,marine science,Reef Life Survey,software engineering,web development"><meta name=description content="Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Building useful machine learning tools keeps getting easier: A fish ID case study"><meta property="og:description" content="Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/"><meta property="og:image" content="https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2022-03-20T04:30:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals.jpg"><meta name=twitter:title content="Building useful machine learning tools keeps getting easier: A fish ID case study"><meta name=twitter:description content="Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Building useful machine learning tools keeps getting easier: A fish ID case study","item":"https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Building useful machine learning tools keeps getting easier: A fish ID case study","name":"Building useful machine learning tools keeps getting easier: A fish ID case study","description":"Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.","keywords":["artificial intelligence","data science","deep learning","fast.ai","machine learning","marine science","Reef Life Survey","software engineering","web development"],"articleBody":"Being a data scientist is a constant struggle with FOMO (fear of missing out): While you spend your time and attention on one tool, technique, or domain, dozens of other areas keep advancing at breakneck speed. It is impossible to keep up with everything. Fortunately, some advancements make it easy for a single person to accomplish tasks that previously required a team of experts. I covered some aspects of this phenomenon in a previous post: Software commodities are eating interesting data science work. Today’s post covers a specific case study, of how I recently overcame some of my deep learning FOMO by building a fish ID web app.\nBackground Until October last year, I was working as a data scientist with Automattic. I was with the company for about 4.5 years in total. In my final two years, I was the tech lead for the company’s unified experimentation platform. In the two years prior to that, I co-led the implementation of the company’s machine learning pipeline. My interest in causal inference was one of the reasons I got involved with the unified experimentation platform, but this involvement meant I neglected my machine learning skills. Similarly, the machine learning pipeline I worked on was focused on marketing applications with tabular data. This meant that there was no need for me to do anything in computer vision or deep learning for many years. In fact, the last time I touched computer vision was due to deep learning FOMO, back in 2015.\nAround the middle of last year, I helped mentor a local edition of the fast.ai Practical Deep Learning for Coders course. I figured it’d help me catch up on some recent developments, while helping others in the community. Given my hobby of volunteering as a scuba diver with the Reef Life Survey (RLS) project, it seemed like a good opportunity to do a side project around automated fish ID. However, the reality of full-time remote work meant that I had little motivation to spend extra time in front of the computer, so that side project never got off the ground.\nFortunately, I decided to leave Automattic and pursue work that better aligns with my values and interests. Rather than jumping into another full-time role, I decided to spend some time exploring and learning – a great antidote to the data science FOMO. First on the agenda after migrating my site off WordPress.com was making progress on the automated fish ID project. While it is still experimental, it’s now live on the RLS website, with the code available in my deep-fish repo.\nThe fish ID tool As far as machine learning applications go, the tool I built isn’t groundbreaking – and that’s exactly the point. Many machine learning apps are boring and “uncool” (fulling fast.ai’s goal of making neural nets uncool again). But such apps are often useful. In my case, the tool scratches an itch felt by many RLS volunteers and other divers: Given a photo taken at a certain location, what fish is in the photo?\nThe tool relies on a classification model trained on images from the RLS website. In addition to the model, it lets users filter results based on previously-observed species at RLS sites. The following video demonstrates how it works:\nI built the computer vision model with fast.ai, and the web app with Streamlit. It only took a couple of weeks to put everything together, and it could have easily been faster if I hadn’t taken the time to understand the underlying modelling code and tinker with various things. I’m sure that the model can be improved – my initial modelling attempts yielded a top-10 accuracy of about 60%, which I subsequently improved to about 72%. The main challenge is that there are 6,628 images and 2,167 species in the dataset I used, so it’s likely that some species can’t be identified reliably from the available training images.\nYou can read through my modelling experiments in the project’s notebooks. Copyright for the images belongs to the photographers, so I can’t share the full dataset.\nLessons learned Rather than writing too much about the model and the code, which aren’t too unusual, I’d like to share a few lessons I learned while working on this project.\n1. Getting reasonable performance out of a deep learning model can be cheap and easy. This lesson is highlighted in the introduction to the fast.ai course: With a few lines of code (and the right data), it’s easy to train reasonable models. It can also be cheap: I only used my laptop’s GPU for most of the experiments, and relied on Kaggle’s free notebook environment for experiments that I couldn’t run locally. On my dataset, I found that training a bigger (ResNet50) model with Kaggle didn’t improve accuracy in comparison to the smaller (ResNet18) model I could fit into my laptop’s GPU memory. This would definitely vary by dataset, but the point is that reasonable performance doesn’t necessarily require much human or computer work. In fact, much of the time I spent on modelling was for my own benefit, to better understand the material taught by fast.ai. Conceptually, I was pleased to discover that many things remained the same since my last foray into computer vision: Reasonable performance can be obtained by using established techniques and pre-trained architectures, while focusing on the data, the modelling pipeline, and augmentations. In my experience, this principle applies to many machine learning problems. This is summarised well by the directive from Google’s Rules of Machine Learning to “do machine learning like the great engineer you are, not like the great machine learning expert you aren’t.”\n2. Building a Streamlit UI feels like magic. I’ve heard about Streamlit years ago, but this was my first time using it. I was impressed with how quickly I could put together a useful app using only Python. I went from a vague idea to a pretty complete implementation in a day (with some additional tinkering in subsequent days). It really is a game changer for data scientists.\n3. Deploying a Streamlit app is a bit less magical. Streamlit Cloud seemed like a straightforward way to deploy Streamlit apps, but I ran into issues because I used a Conda environment. I managed to work around those issues, but it seems like the environment installed on Cloud isn’t truly isolated: Judging by the logs, Streamlit Cloud reads the Conda file and installs the required packages into an existing environment. This results in weird error messages that are hard to debug. I also ran into memory issues, which seem to be un-debuggable with the information provided by Streamlit Cloud. Still, I decided to initially deploy the app to Streamlit Cloud’s free tier and wrap it in an iframe for the RLS website. Given the steep increase in price from the free tier to the lowest paid tier (US$250 / month), it’s likely I’d switch to self-hosting if I run into more issues. This is a disappointing contrast to the magical experience of building the UI, but I hope that Streamlit Cloud would become easier to use with time.\n4. The fast.ai library is a great starting point, despite its quirks. Using fast.ai felt a bit like cheating, in the sense captured by xkcd’s Real Programmers comic. Given the hype, it feels like it should be harder to build useful models – real data scientists use PyTorch directly! But no, in reality it makes sense to use the best tool for the job. And there’s nothing wrong with something being easy or fast, as it lets you spend more time elsewhere. In the words of the principles behind the agile manifesto: “Simplicity – the art of maximizing the amount of work not done – is essential.”\nReal Programmers don’t do easy things. Source: xkcd. That said, the fast.ai library isn’t perfect. Debugging can be a bit frustrating, as it tries to do a lot of things automatically and mutates many objects in surprising ways. Its documentation is also somewhat lacking (perhaps due to the use of notebooks as the primary development environment), and its naming conventions can be a bit odd (especially the overuse of acronyms). But these are minor annoyances rather than blockers. It does work well for its main use cases, and it’s possible to go down to the PyTorch level when necessary.\n5. As always, it’s all about the data and how you use it. This is hardly a new lesson for me, but it’s worth reiterating. Given the maturity of computer vision and other machine learning packages, data scientists should focus on getting relevant data and understanding the problem well. As Andrej Karpathy noted in his 2019 recipe for training neural nets, and I said in my 2014 Kaggle tips, you should aim to become one with the data.\n6. FOMO will always be there, but it can be lessened. In general, I care more about making useful things than about using the latest techniques. This is why I prioritised working with RLS to get my tool deployed. Still, FOMO in data science is a well-documented phenomenon, and I suffer from it too. It’s encouraging that – given some free time and a clear head – it’s not that hard to catch up on recent developments. This is made especially easy by the availability of many free resources, like fast.ai. The main thing to remember is to focus on principles rather than worry about the million methods and tools that are out there – it was true in 1911, and it’s still true today.\n","wordCount":"1595","inLanguage":"en","image":"https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals.jpg","datePublished":"2022-03-20T04:30:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Building useful machine learning tools keeps getting easier: A fish ID case study</h1><div class=post-meta><span title='2022-03-20 04:30:00 +0000 UTC'>March 20, 2022</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals_hu2278fbb0a04afae4f432aacc3e29a944_909882_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals_hu2278fbb0a04afae4f432aacc3e29a944_909882_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals_hu2278fbb0a04afae4f432aacc3e29a944_909882_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals_hu2278fbb0a04afae4f432aacc3e29a944_909882_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals_hu2278fbb0a04afae4f432aacc3e29a944_909882_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals.jpg 3066w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals.jpg alt width=3066 height=1214></figure><div class=post-content><p>Being a data scientist is a constant struggle with FOMO (<em>fear of missing out</em>): While you spend your time and attention on one tool, technique, or domain, dozens of other areas keep advancing at breakneck speed. It is impossible to keep up with everything. Fortunately, some advancements make it easy for a single person to accomplish tasks that previously required a team of experts. I covered some aspects of this phenomenon in a previous post: <a href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/>Software commodities are eating interesting data science work</a>. Today&rsquo;s post covers a specific case study, of how I recently overcame some of my deep learning FOMO by building a fish ID web app.</p><h2 id=background>Background<a hidden class=anchor aria-hidden=true href=#background>#</a></h2><p>Until October last year, I was working as a data scientist with Automattic. I was with the company for about 4.5 years in total. In my final two years, I was the tech lead for <a href=https://data.blog/2021/04/14/architecting-explat-automattics-new-experimentation-platform/ target=_blank rel=noopener>the company&rsquo;s unified experimentation platform</a>. In the two years prior to that, I co-led the implementation of <a href=https://data.blog/2018/11/15/introducing-pipe-the-automattic-machine-learning-pipeline/ target=_blank rel=noopener>the company&rsquo;s machine learning pipeline</a>. My interest in causal inference was one of the reasons I got involved with the unified experimentation platform, but this involvement meant I neglected my machine learning skills. Similarly, the machine learning pipeline I worked on was focused on marketing applications with tabular data. This meant that there was no need for me to do anything in computer vision or deep learning for many years. In fact, the last time I touched computer vision was <a href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/>due to deep learning FOMO, back in 2015</a>.</p><p>Around the middle of last year, <a href=https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/>I helped mentor a local edition of the fast.ai <em>Practical Deep Learning for Coders</em> course</a>. I figured it&rsquo;d help me catch up on some recent developments, while helping others in the community. Given my hobby of <a href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/>volunteering as a scuba diver with the Reef Life Survey (RLS) project</a>, it seemed like a good opportunity to do a side project around automated fish ID. However, the reality of full-time remote work meant that I had little motivation to spend extra time in front of the computer, so that side project never got off the ground.</p><p>Fortunately, I decided to leave Automattic and pursue work that better aligns with my values and interests. Rather than jumping into another full-time role, I decided to spend some time exploring and learning – a great antidote to the data science FOMO. First on the agenda after <a href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/>migrating my site off WordPress.com</a> was making progress on the automated fish ID project. While it is still experimental, <a href=https://reeflifesurvey.com/fish-id/ target=_blank rel=noopener>it&rsquo;s now live on the RLS website</a>, with the code available in <a href=https://github.com/yanirs/deep-fish target=_blank rel=noopener>my deep-fish repo</a>.</p><h2 id=the-fish-id-tool>The fish ID tool<a hidden class=anchor aria-hidden=true href=#the-fish-id-tool>#</a></h2><p>As far as machine learning applications go, the tool I built isn&rsquo;t groundbreaking – and that&rsquo;s exactly the point. Many machine learning apps are boring and &ldquo;uncool&rdquo; (fulling <a href=https://www.fast.ai/about/#slogan target=_blank rel=noopener>fast.ai&rsquo;s goal</a> of <em>making neural nets uncool again</em>). But such apps are often useful. In my case, the tool scratches an itch felt by many RLS volunteers and other divers: <em>Given a photo taken at a certain location, what fish is in the photo?</em></p><p>The tool relies on a classification model trained on images from the RLS website. In addition to the model, it lets users filter results based on previously-observed species at RLS sites. The following video demonstrates how it works:</p><p style=text-align:center><iframe src=https://drive.google.com/file/d/1hpfRY26ZQXHzhYpAIP-3ShsqpuCSXfth/preview width=640 height=480 allow=autoplay></iframe></p><p>I built the computer vision model with fast.ai, and the web app with <a href=https://streamlit.io/ target=_blank rel=noopener>Streamlit</a>. It only took a couple of weeks to put everything together, and it could have easily been faster if I hadn&rsquo;t taken the time to understand the underlying modelling code and tinker with various things. I&rsquo;m sure that the model can be improved – my initial modelling attempts yielded a top-10 accuracy of about 60%, which I subsequently improved to about 72%. The main challenge is that there are 6,628 images and 2,167 species in the dataset I used, so it&rsquo;s likely that some species can&rsquo;t be identified reliably from the available training images.</p><p>You can read through my modelling experiments in <a href=https://github.com/yanirs/deep-fish/tree/master/notebooks target=_blank rel=noopener>the project&rsquo;s notebooks</a>. Copyright for the images belongs to the photographers, so I can&rsquo;t share the full dataset.</p><h2 id=lessons-learned>Lessons learned<a hidden class=anchor aria-hidden=true href=#lessons-learned>#</a></h2><p>Rather than writing too much about the model and the code, which aren&rsquo;t too unusual, I&rsquo;d like to share a few lessons I learned while working on this project.</p><p><strong>1. Getting reasonable performance out of a deep learning model can be cheap and easy.</strong> This lesson is highlighted in <a href=https://course.fast.ai/ target=_blank rel=noopener>the introduction to the fast.ai course</a>: With a few lines of code (and the right data), it&rsquo;s easy to train reasonable models. It can also be cheap: I only used my laptop&rsquo;s GPU for most of the experiments, and relied on Kaggle&rsquo;s free notebook environment for experiments that I couldn&rsquo;t run locally. On my dataset, I found that training a bigger (ResNet50) model with Kaggle didn&rsquo;t improve accuracy in comparison to the smaller (ResNet18) model I could fit into my laptop&rsquo;s GPU memory. This would definitely vary by dataset, but the point is that reasonable performance doesn&rsquo;t necessarily require much human or computer work. In fact, much of the time I spent on modelling was for my own benefit, to better understand the material taught by fast.ai. Conceptually, I was pleased to discover that many things remained the same since my last foray into computer vision: Reasonable performance can be obtained by using established techniques and pre-trained architectures, while focusing on the data, the modelling pipeline, and augmentations. In my experience, this principle applies to many machine learning problems. This is summarised well by the directive from <a href=https://developers.google.com/machine-learning/guides/rules-of-ml/ target=_blank rel=noopener>Google&rsquo;s Rules of Machine Learning</a> to <em>&ldquo;do machine learning like the great engineer you are, not like the great machine learning expert you aren&rsquo;t.&rdquo;</em></p><p><strong>2. Building a Streamlit UI feels like magic.</strong> I&rsquo;ve heard about Streamlit years ago, but this was my first time using it. I was impressed with how quickly I could put together a useful app using only Python. I went from a vague idea to a pretty complete implementation in a day (with some additional tinkering in subsequent days). It really is a game changer for data scientists.</p><p><strong>3. Deploying a Streamlit app is a bit less magical.</strong> Streamlit Cloud seemed like a straightforward way to deploy Streamlit apps, but I ran into issues because I used a Conda environment. I managed to work around those issues, but it seems like the environment installed on Cloud isn&rsquo;t truly isolated: Judging by the logs, Streamlit Cloud reads the Conda file and installs the required packages into an existing environment. This results in weird error messages that are hard to debug. I also ran into memory issues, which seem to be un-debuggable with the information provided by Streamlit Cloud. Still, I decided to initially deploy the app to Streamlit Cloud&rsquo;s free tier and wrap it in an iframe for the RLS website. Given the steep increase in price from the free tier to the lowest paid tier (US$250 / month), it&rsquo;s likely I&rsquo;d switch to self-hosting if I run into more issues. This is a disappointing contrast to the magical experience of building the UI, but I hope that Streamlit Cloud would become easier to use with time.</p><p><strong>4. The fast.ai library is a great starting point, despite its quirks.</strong> Using fast.ai felt a bit like cheating, in the sense captured by xkcd&rsquo;s Real Programmers comic. Given the hype, it feels like it should be harder to build useful models – <em>real data scientists use PyTorch directly!</em> But no, in reality it makes sense to use the best tool for the job. And there&rsquo;s nothing wrong with something being easy or fast, as it lets you spend more time elsewhere. In the words of <a href=https://agilemanifesto.org/principles.html target=_blank rel=noopener>the principles behind the agile manifesto</a>: <em>&ldquo;Simplicity – the art of maximizing the amount of work not done – is essential.&rdquo;</em></p><figure class=white-bg><a href=real-programmers-xkcd.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="artificial intelligence,data science,deep learning,fast.ai,machine learning,marine science,Reef Life Survey,software engineering,web development"><meta name=description content="Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Building useful machine learning tools keeps getting easier: A fish ID case study"><meta property="og:description" content="Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/"><meta property="og:image" content="https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2022-03-20T04:30:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals.jpg"><meta name=twitter:title content="Building useful machine learning tools keeps getting easier: A fish ID case study"><meta name=twitter:description content="Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Building useful machine learning tools keeps getting easier: A fish ID case study","item":"https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Building useful machine learning tools keeps getting easier: A fish ID case study","name":"Building useful machine learning tools keeps getting easier: A fish ID case study","description":"Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.","keywords":["artificial intelligence","data science","deep learning","fast.ai","machine learning","marine science","Reef Life Survey","software engineering","web development"],"articleBody":"Being a data scientist is a constant struggle with FOMO (fear of missing out): While you spend your time and attention on one tool, technique, or domain, dozens of other areas keep advancing at breakneck speed. It is impossible to keep up with everything. Fortunately, some advancements make it easy for a single person to accomplish tasks that previously required a team of experts. I covered some aspects of this phenomenon in a previous post: Software commodities are eating interesting data science work. Today’s post covers a specific case study, of how I recently overcame some of my deep learning FOMO by building a fish ID web app.\nBackground Until October last year, I was working as a data scientist with Automattic. I was with the company for about 4.5 years in total. In my final two years, I was the tech lead for the company’s unified experimentation platform. In the two years prior to that, I co-led the implementation of the company’s machine learning pipeline. My interest in causal inference was one of the reasons I got involved with the unified experimentation platform, but this involvement meant I neglected my machine learning skills. Similarly, the machine learning pipeline I worked on was focused on marketing applications with tabular data. This meant that there was no need for me to do anything in computer vision or deep learning for many years. In fact, the last time I touched computer vision was due to deep learning FOMO, back in 2015.\nAround the middle of last year, I helped mentor a local edition of the fast.ai Practical Deep Learning for Coders course. I figured it’d help me catch up on some recent developments, while helping others in the community. Given my hobby of volunteering as a scuba diver with the Reef Life Survey (RLS) project, it seemed like a good opportunity to do a side project around automated fish ID. However, the reality of full-time remote work meant that I had little motivation to spend extra time in front of the computer, so that side project never got off the ground.\nFortunately, I decided to leave Automattic and pursue work that better aligns with my values and interests. Rather than jumping into another full-time role, I decided to spend some time exploring and learning – a great antidote to the data science FOMO. First on the agenda after migrating my site off WordPress.com was making progress on the automated fish ID project. While it is still experimental, it’s now live on the RLS website, with the code available in my deep-fish repo.\nThe fish ID tool As far as machine learning applications go, the tool I built isn’t groundbreaking – and that’s exactly the point. Many machine learning apps are boring and “uncool” (fulling fast.ai’s goal of making neural nets uncool again). But such apps are often useful. In my case, the tool scratches an itch felt by many RLS volunteers and other divers: Given a photo taken at a certain location, what fish is in the photo?\nThe tool relies on a classification model trained on images from the RLS website. In addition to the model, it lets users filter results based on previously-observed species at RLS sites. The following video demonstrates how it works:\nI built the computer vision model with fast.ai, and the web app with Streamlit. It only took a couple of weeks to put everything together, and it could have easily been faster if I hadn’t taken the time to understand the underlying modelling code and tinker with various things. I’m sure that the model can be improved – my initial modelling attempts yielded a top-10 accuracy of about 60%, which I subsequently improved to about 72%. The main challenge is that there are 6,628 images and 2,167 species in the dataset I used, so it’s likely that some species can’t be identified reliably from the available training images.\nYou can read through my modelling experiments in the project’s notebooks. Copyright for the images belongs to the photographers, so I can’t share the full dataset.\nLessons learned Rather than writing too much about the model and the code, which aren’t too unusual, I’d like to share a few lessons I learned while working on this project.\n1. Getting reasonable performance out of a deep learning model can be cheap and easy. This lesson is highlighted in the introduction to the fast.ai course: With a few lines of code (and the right data), it’s easy to train reasonable models. It can also be cheap: I only used my laptop’s GPU for most of the experiments, and relied on Kaggle’s free notebook environment for experiments that I couldn’t run locally. On my dataset, I found that training a bigger (ResNet50) model with Kaggle didn’t improve accuracy in comparison to the smaller (ResNet18) model I could fit into my laptop’s GPU memory. This would definitely vary by dataset, but the point is that reasonable performance doesn’t necessarily require much human or computer work. In fact, much of the time I spent on modelling was for my own benefit, to better understand the material taught by fast.ai. Conceptually, I was pleased to discover that many things remained the same since my last foray into computer vision: Reasonable performance can be obtained by using established techniques and pre-trained architectures, while focusing on the data, the modelling pipeline, and augmentations. In my experience, this principle applies to many machine learning problems. This is summarised well by the directive from Google’s Rules of Machine Learning to “do machine learning like the great engineer you are, not like the great machine learning expert you aren’t.”\n2. Building a Streamlit UI feels like magic. I’ve heard about Streamlit years ago, but this was my first time using it. I was impressed with how quickly I could put together a useful app using only Python. I went from a vague idea to a pretty complete implementation in a day (with some additional tinkering in subsequent days). It really is a game changer for data scientists.\n3. Deploying a Streamlit app is a bit less magical. Streamlit Cloud seemed like a straightforward way to deploy Streamlit apps, but I ran into issues because I used a Conda environment. I managed to work around those issues, but it seems like the environment installed on Cloud isn’t truly isolated: Judging by the logs, Streamlit Cloud reads the Conda file and installs the required packages into an existing environment. This results in weird error messages that are hard to debug. I also ran into memory issues, which seem to be un-debuggable with the information provided by Streamlit Cloud. Still, I decided to initially deploy the app to Streamlit Cloud’s free tier and wrap it in an iframe for the RLS website. Given the steep increase in price from the free tier to the lowest paid tier (US$250 / month), it’s likely I’d switch to self-hosting if I run into more issues. This is a disappointing contrast to the magical experience of building the UI, but I hope that Streamlit Cloud would become easier to use with time.\n4. The fast.ai library is a great starting point, despite its quirks. Using fast.ai felt a bit like cheating, in the sense captured by xkcd’s Real Programmers comic. Given the hype, it feels like it should be harder to build useful models – real data scientists use PyTorch directly! But no, in reality it makes sense to use the best tool for the job. And there’s nothing wrong with something being easy or fast, as it lets you spend more time elsewhere. In the words of the principles behind the agile manifesto: “Simplicity – the art of maximizing the amount of work not done – is essential.”\nReal Programmers don’t do easy things. Source: xkcd. That said, the fast.ai library isn’t perfect. Debugging can be a bit frustrating, as it tries to do a lot of things automatically and mutates many objects in surprising ways. Its documentation is also somewhat lacking (perhaps due to the use of notebooks as the primary development environment), and its naming conventions can be a bit odd (especially the overuse of acronyms). But these are minor annoyances rather than blockers. It does work well for its main use cases, and it’s possible to go down to the PyTorch level when necessary.\n5. As always, it’s all about the data and how you use it. This is hardly a new lesson for me, but it’s worth reiterating. Given the maturity of computer vision and other machine learning packages, data scientists should focus on getting relevant data and understanding the problem well. As Andrej Karpathy noted in his 2019 recipe for training neural nets, and I said in my 2014 Kaggle tips, you should aim to become one with the data.\n6. FOMO will always be there, but it can be lessened. In general, I care more about making useful things than about using the latest techniques. This is why I prioritised working with RLS to get my tool deployed. Still, FOMO in data science is a well-documented phenomenon, and I suffer from it too. It’s encouraging that – given some free time and a clear head – it’s not that hard to catch up on recent developments. This is made especially easy by the availability of many free resources, like fast.ai. The main thing to remember is to focus on principles rather than worry about the million methods and tools that are out there – it was true in 1911, and it’s still true today.\n","wordCount":"1595","inLanguage":"en","image":"https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals.jpg","datePublished":"2022-03-20T04:30:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Building useful machine learning tools keeps getting easier: A fish ID case study</h1><div class=post-meta><span title='2022-03-20 04:30:00 +0000 UTC'>March 20, 2022</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals_hu2278fbb0a04afae4f432aacc3e29a944_909882_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals_hu2278fbb0a04afae4f432aacc3e29a944_909882_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals_hu2278fbb0a04afae4f432aacc3e29a944_909882_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals_hu2278fbb0a04afae4f432aacc3e29a944_909882_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals_hu2278fbb0a04afae4f432aacc3e29a944_909882_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals.jpg 3066w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/cardinals.jpg alt width=3066 height=1214></figure><div class=post-content><p>Being a data scientist is a constant struggle with FOMO (<em>fear of missing out</em>): While you spend your time and attention on one tool, technique, or domain, dozens of other areas keep advancing at breakneck speed. It is impossible to keep up with everything. Fortunately, some advancements make it easy for a single person to accomplish tasks that previously required a team of experts. I covered some aspects of this phenomenon in a previous post: <a href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/>Software commodities are eating interesting data science work</a>. Today&rsquo;s post covers a specific case study, of how I recently overcame some of my deep learning FOMO by building a fish ID web app.</p><h2 id=background>Background<a hidden class=anchor aria-hidden=true href=#background>#</a></h2><p>Until October last year, I was working as a data scientist with Automattic. I was with the company for about 4.5 years in total. In my final two years, I was the tech lead for <a href=https://data.blog/2021/04/14/architecting-explat-automattics-new-experimentation-platform/ target=_blank rel=noopener>the company&rsquo;s unified experimentation platform</a>. In the two years prior to that, I co-led the implementation of <a href=https://data.blog/2018/11/15/introducing-pipe-the-automattic-machine-learning-pipeline/ target=_blank rel=noopener>the company&rsquo;s machine learning pipeline</a>. My interest in causal inference was one of the reasons I got involved with the unified experimentation platform, but this involvement meant I neglected my machine learning skills. Similarly, the machine learning pipeline I worked on was focused on marketing applications with tabular data. This meant that there was no need for me to do anything in computer vision or deep learning for many years. In fact, the last time I touched computer vision was <a href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/>due to deep learning FOMO, back in 2015</a>.</p><p>Around the middle of last year, <a href=https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/>I helped mentor a local edition of the fast.ai <em>Practical Deep Learning for Coders</em> course</a>. I figured it&rsquo;d help me catch up on some recent developments, while helping others in the community. Given my hobby of <a href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/>volunteering as a scuba diver with the Reef Life Survey (RLS) project</a>, it seemed like a good opportunity to do a side project around automated fish ID. However, the reality of full-time remote work meant that I had little motivation to spend extra time in front of the computer, so that side project never got off the ground.</p><p>Fortunately, I decided to leave Automattic and pursue work that better aligns with my values and interests. Rather than jumping into another full-time role, I decided to spend some time exploring and learning – a great antidote to the data science FOMO. First on the agenda after <a href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/>migrating my site off WordPress.com</a> was making progress on the automated fish ID project. While it is still experimental, <a href=https://reeflifesurvey.com/fish-id/ target=_blank rel=noopener>it&rsquo;s now live on the RLS website</a>, with the code available in <a href=https://github.com/yanirs/deep-fish target=_blank rel=noopener>my deep-fish repo</a>.</p><h2 id=the-fish-id-tool>The fish ID tool<a hidden class=anchor aria-hidden=true href=#the-fish-id-tool>#</a></h2><p>As far as machine learning applications go, the tool I built isn&rsquo;t groundbreaking – and that&rsquo;s exactly the point. Many machine learning apps are boring and &ldquo;uncool&rdquo; (fulling <a href=https://www.fast.ai/about/#slogan target=_blank rel=noopener>fast.ai&rsquo;s goal</a> of <em>making neural nets uncool again</em>). But such apps are often useful. In my case, the tool scratches an itch felt by many RLS volunteers and other divers: <em>Given a photo taken at a certain location, what fish is in the photo?</em></p><p>The tool relies on a classification model trained on images from the RLS website. In addition to the model, it lets users filter results based on previously-observed species at RLS sites. The following video demonstrates how it works:</p><p style=text-align:center><iframe src=https://drive.google.com/file/d/1hpfRY26ZQXHzhYpAIP-3ShsqpuCSXfth/preview width=640 height=480 allow=autoplay></iframe></p><p>I built the computer vision model with fast.ai, and the web app with <a href=https://streamlit.io/ target=_blank rel=noopener>Streamlit</a>. It only took a couple of weeks to put everything together, and it could have easily been faster if I hadn&rsquo;t taken the time to understand the underlying modelling code and tinker with various things. I&rsquo;m sure that the model can be improved – my initial modelling attempts yielded a top-10 accuracy of about 60%, which I subsequently improved to about 72%. The main challenge is that there are 6,628 images and 2,167 species in the dataset I used, so it&rsquo;s likely that some species can&rsquo;t be identified reliably from the available training images.</p><p>You can read through my modelling experiments in <a href=https://github.com/yanirs/deep-fish/tree/master/notebooks target=_blank rel=noopener>the project&rsquo;s notebooks</a>. Copyright for the images belongs to the photographers, so I can&rsquo;t share the full dataset.</p><h2 id=lessons-learned>Lessons learned<a hidden class=anchor aria-hidden=true href=#lessons-learned>#</a></h2><p>Rather than writing too much about the model and the code, which aren&rsquo;t too unusual, I&rsquo;d like to share a few lessons I learned while working on this project.</p><p><strong>1. Getting reasonable performance out of a deep learning model can be cheap and easy.</strong> This lesson is highlighted in <a href=https://course.fast.ai/ target=_blank rel=noopener>the introduction to the fast.ai course</a>: With a few lines of code (and the right data), it&rsquo;s easy to train reasonable models. It can also be cheap: I only used my laptop&rsquo;s GPU for most of the experiments, and relied on Kaggle&rsquo;s free notebook environment for experiments that I couldn&rsquo;t run locally. On my dataset, I found that training a bigger (ResNet50) model with Kaggle didn&rsquo;t improve accuracy in comparison to the smaller (ResNet18) model I could fit into my laptop&rsquo;s GPU memory. This would definitely vary by dataset, but the point is that reasonable performance doesn&rsquo;t necessarily require much human or computer work. In fact, much of the time I spent on modelling was for my own benefit, to better understand the material taught by fast.ai. Conceptually, I was pleased to discover that many things remained the same since my last foray into computer vision: Reasonable performance can be obtained by using established techniques and pre-trained architectures, while focusing on the data, the modelling pipeline, and augmentations. In my experience, this principle applies to many machine learning problems. This is summarised well by the directive from <a href=https://developers.google.com/machine-learning/guides/rules-of-ml/ target=_blank rel=noopener>Google&rsquo;s Rules of Machine Learning</a> to <em>&ldquo;do machine learning like the great engineer you are, not like the great machine learning expert you aren&rsquo;t.&rdquo;</em></p><p><strong>2. Building a Streamlit UI feels like magic.</strong> I&rsquo;ve heard about Streamlit years ago, but this was my first time using it. I was impressed with how quickly I could put together a useful app using only Python. I went from a vague idea to a pretty complete implementation in a day (with some additional tinkering in subsequent days). It really is a game changer for data scientists.</p><p><strong>3. Deploying a Streamlit app is a bit less magical.</strong> Streamlit Cloud seemed like a straightforward way to deploy Streamlit apps, but I ran into issues because I used a Conda environment. I managed to work around those issues, but it seems like the environment installed on Cloud isn&rsquo;t truly isolated: Judging by the logs, Streamlit Cloud reads the Conda file and installs the required packages into an existing environment. This results in weird error messages that are hard to debug. I also ran into memory issues, which seem to be un-debuggable with the information provided by Streamlit Cloud. Still, I decided to initially deploy the app to Streamlit Cloud&rsquo;s free tier and wrap it in an iframe for the RLS website. Given the steep increase in price from the free tier to the lowest paid tier (US$250 / month), it&rsquo;s likely I&rsquo;d switch to self-hosting if I run into more issues. This is a disappointing contrast to the magical experience of building the UI, but I hope that Streamlit Cloud would become easier to use with time.</p><p><strong>4. The fast.ai library is a great starting point, despite its quirks.</strong> Using fast.ai felt a bit like cheating, in the sense captured by xkcd&rsquo;s Real Programmers comic. Given the hype, it feels like it should be harder to build useful models – <em>real data scientists use PyTorch directly!</em> But no, in reality it makes sense to use the best tool for the job. And there&rsquo;s nothing wrong with something being easy or fast, as it lets you spend more time elsewhere. In the words of <a href=https://agilemanifesto.org/principles.html target=_blank rel=noopener>the principles behind the agile manifesto</a>: <em>&ldquo;Simplicity – the art of maximizing the amount of work not done – is essential.&rdquo;</em></p><figure class=white-bg><a href=real-programmers-xkcd.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/real-programmers-xkcd_hu9dfb6fbb197bd50ff3498ba40a1f618e_84499_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/real-programmers-xkcd_hu9dfb6fbb197bd50ff3498ba40a1f618e_84499_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/real-programmers-xkcd_hu9dfb6fbb197bd50ff3498ba40a1f618e_84499_720x0_resize_box_3.png 720w,
diff --git a/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/index.html b/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/index.html
index 25e67ebee..35827c4bc 100644
--- a/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/index.html
+++ b/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>The mission matters: Moving to climate tech as a data scientist | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="Automattic,career,climate change,data science,environment,Orkestra,personal,politics,remote work,sustainability"><meta name=description content="Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The mission matters: Moving to climate tech as a data scientist"><meta property="og:description" content="Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/"><meta property="og:image" content="https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2022-06-06T00:00:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission.jpg"><meta name=twitter:title content="The mission matters: Moving to climate tech as a data scientist"><meta name=twitter:description content="Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"The mission matters: Moving to climate tech as a data scientist","item":"https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The mission matters: Moving to climate tech as a data scientist","name":"The mission matters: Moving to climate tech as a data scientist","description":"Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.","keywords":["Automattic","career","climate change","data science","environment","Orkestra","personal","politics","remote work","sustainability"],"articleBody":" So we are facing the most consequential fork in the road. If we continue as now, we are going to be irreparably going down a course of constant destruction, with much human pain and biodiversity loss. Or we can choose to go in the other direction, a path of reconstruction and regeneration, and at least diminish the negative impacts of climate change to something that is manageable.\nBut we can only choose it this decade. Our parents did not have this choice, because they didn’t have the capital, technologies and understanding. And for our children, it will be too late. So this is the decade and we are the generation.\nChristiana Figueres Interview on The Future we Choose (2020) Multiple factors contributed to my decision to leave Automattic last year. One factor was that the company’s mission to “democratize publishing and eCommerce” doesn’t resonate with me:1 First, publishing and eCommerce are already widely accessible. Second, despite decades of increased access to a wide variety of publication tools, global democracy is declining. Third, a corollary of the mission is hosting publications by the likes of News Corp Australia, an organisation that is harming Australian democracy according to former prime ministers from both sides of politics. Fourth, I believe that there are more pressing problems I can spend my time on.2\nOne such problem is the climate crisis. I was fortunate to have spent a small amount of time on it at Automattic, where I co-founded a sustainability employee group and led the company’s first purchases of carbon offsets and removals. However, this was a side gig.3\nWhen I left Automattic, I was hoping to get involved more directly in climate and environmental action. Having recently joined Orkestra – a company whose mission is “to power the world’s energy decision-making” – I figured it’s time to share some thoughts on the climate tech space, along with resources that others may find useful.\nDoing more with my climate obsession I’ve always cared about the environment, but my levels of activity in the area have fluctuated over the years. While it’s no excuse, I suppose that environmental issues often appear too intractable, especially with the growth of human population and of the percentage of humans who live in societies that require continuous economic growth to prosper. Collectively, we still haven’t figured out how to obtain prosperity without growth. Given the magnitude of the problems, even the most influential individuals can only make a relatively small impact on driving solutions.\nStill, being unable to do everything doesn’t mean one should do nothing, as the too-small-to-matter excuse can even be applied at the country level. For example, some people claim that given Australia’s small share of global emissions, it can’t play a significant role in addressing climate change. This conveniently neglects the fact that Australia has one of the highest per-capita carbon footprints in the world, and that it is a major exporter of fossil fuels. Clearly, Australia can do more to help achieve the collective goal of keeping global heating below truly dangerous levels. And doing it intelligently would help prosperity, as Australia is uniquely positioned to become a green energy superpower. Given the results of the last federal election, most Australians fall on the “do more” side of the debate.\nAnyway, I am not a country, but similar logic applies: I can do more as an individual, even though my personal emissions are negligible when compared to the daunting amount emitted by humanity as a whole.\nWhen it comes to climate action, a couple of key milestones for me were in 2015, when I became more aware of how I could divest from fossil fuels, and in early 2020, when the massive fires in Australia made me want to do something. Among other things, this led me to push for climate action within Automattic, as noted above. It also led me to – somewhat obsessively – consume quite a few resources on the topic. Honourable mentions go to Outrage + Optimism, Volts, TIL Climate, and My Climate Journey – many others are sprinkled throughout this article.\nOne outcome of the obsession is that I’m more aware of the impacts of climate change, environmental degradation, and government inaction. Massive fires? Climate change increases their frequency and severity. Global pandemics? Habitat loss and greater human-animal interaction increase their probability, while air pollution increases risk from respiratory infections. Widespread floods? Climate change increases flood impact and over-development on floodplains leads to avoidable suffering. More frequent coral bleaching? Increased emissions lead to ocean heatwaves and acidification, while reduced water quality and overfishing certainly don’t help ocean ecosystems.\nSo yeah, big problems. And one can always more/better to help. But it’s usually possible to also do less or worse. Therefore, I believe in doing more while cutting people slack, as suggested by Sami Grover:\nSo by all means, skip that next beef burger, or take a pass on that cheap flight to Cancún. But then ask yourself how you can magnify the impact of what you do. Are there campaigns or advocacy groups you can join? Can you talk to friends or family about the shifts you are making? Can you influence policy or practices at your place of work or study? Can you identify barriers to action that are preventing others from joining in?\nIn so doing, remember to cut yourself, and those around you, some slack. We are not each on an individual journey to slash our footprint to zero. We are on a collective mission to shift the only true footprint that matters: that of society as a whole.\nClimate tech and its intersections with data science When it comes to doing more, one path that a growing number of people seem to take is getting into climate tech. What is climate tech? Good question. To me, defining it is somewhat reminiscent of attempts to define data science, which I’ve tackled in posts from 2014 to 2018.4 In the same way that data science encompassed things that some people have been doing for decades, climate tech is giving a new name to existing activities. Broadly, I’d say that it’s work on technology to reverse, mitigate, and adapt to anthropogenic climate change.\nAnother parallel I see between data science and climate tech is that many things with tenuous connections to the field get lumped into it, in an attempt to capitalise on its trendiness. I think we’re past the peak of the data science hype, but there was a time when people who had only taken cursory looks at data rebranded as data scientists. Similarly, there are “climate tech” companies out there that may have a negative or neutral impact on fighting climate change. Personally, I’m also skeptical of grouping adaptation efforts under climate tech. For example, dealing with extreme weather events is needed even in a world with a stable climate, so I don’t think such work captures the intention behind climate tech (though it can be valuable).\nMost importantly, no matter how you define climate tech and data science, there is a need for data skills to develop technologies that address climate change. And this is where data scientists who are concerned about the climate (like me) can help make difference. In the words of Saul Griffith:\nIf you are a tech worker, stop making social media and delivery apps and make software that helps people use less energy, balances the grid, automates the design of solar and wind plants, makes public transit work better, and does other useful things to accelerate our transition to renewables.\nA structured approach to making career decisions I’m fortunate to have skills that are in demand in the current market. I’m also fortunate to be in a financial position that allows me to take unpaid time off. Put together, this means that I have a high degree of freedom to choose how I spend my time.\nIn the past, I’ve advocated for asking why about every career step. And indeed, I can explain the reasoning behind every point in my resume. Sometimes, a step is due to dumb luck, e.g., I discovered that I was a data scientist in 2012, the year Harvard Business Review deemed it the sexiest job of the 21st century5 – I didn’t plan to become a data scientist when I started my PhD in 2009. And sometimes, a step is more planned – I specifically targeted Automattic as one of the few established fully-remote companies that was hiring data scientists in 2017, as my goals included living outside major cities and having a job that I can hold for more than a year without wanting to run away.\nGiven that my current position presents more options than I’ve had in the past, I decided to have a look through 80,000 Hours. I’ve been aware of their work for years, but my vague impression was that they’re overly utilitarian. However, digging through their resources, I found that they emphasise the importance of personal fit and well-being, both when it comes to career paths and to problem areas. For example, they aren’t too pushy about choosing the problems that they find most pressing if it doesn’t align with one’s beliefs and values.\nThe 80,000 Hours website contains a wealth of well-reasoned articles. I found the self-guided course on career planning useful to go through, as it helped me apply their main ideas to my situation. While I don’t feel like it led to a major shift in my views and plans, having more structure and a richer terminology to think through my career decisions is helpful.\nThat said, one area where I diverge from the 80,000 Hours philosophy is in concern about far-future human extinction. They conclude that climate change is less recommended than other problems as the odds of it leading to human extinction are low. However, working in the climate space should alleviate human suffering in this century and reduce the extinction risks of nonhuman animals. Both of these are important to me, especially given the rich cultural lives of animals like whales and dolphins.\nWhales have cultures and massive brains, but perhaps you don’t care. How I ended up with Orkestra In retrospect, my ~4.5 years at Automattic could be divided into the pre-pandemic and pandemic periods. Pre-pandemic, I got to travel a few times a year to meet my colleagues in person. From the time the pandemic hit, this wasn’t an option. While I was lucky to be with a company that had already figured out how to work remotely, I found the complete lack of in-person interaction with my colleagues to be too isolating and monotonous.6 Together with the pandemic-era stressors that affected pretty much everything, I felt that Automattic had become a less pleasant place to work.\nWhen reflecting on my decision to leave, I realised that I had experienced two of the three dimensions of occupational burnout: I was high on exhaustion and cynicism, but felt like I still had professional efficacy. As I take pride in doing good work, I was concerned about losing my sense of efficacy and burning out on all three fronts. It was definitely time to leave, especially since burnout is seen by researchers as “a sign of a major dysfunction within an organization [that] says more about the workplace than it does about the employees”.\nGiven my recent burnout experience, I was reluctant to jump into a full-time job. I took some time to relax, and worked on side projects like getting my website off WordPress.com and developing a web app for fish identification. Concurrently, I was also looking to learn more about the climate tech space. I was already a member of the Climate Action Tech community and a consumer of various other climate-related resources, but my search had also led me to places like the Climate People agency and the Work on Climate community. Looking through these resources became a part of my routine, and it was on the Climate Action Tech Slack that I saw a short message by Chris Cooper, advertising open positions at Orkestra (then called Vippy).\nFrom the time I decided to enquire, things moved quickly. By early February, we agreed to engage in a short-term contract where I would do data science work for three days per week. This was largely because I wanted to keep my options open and avoid over-committing myself, especially after the burnout I experienced at Automattic.\nWhile the original plan was to use the contract as a trial towards full-time employment, I found that I enjoyed working only three focused days on Orkestra. It was a refreshing change from the sort of work I was doing at Automattic – perhaps a similar feeling to that of a former Automattic employee who moved to DuckDuckGo: “the big shift was to an all-business-low-drama environment, meaning that my job was cognitively harder but emotionally easier”.\nTherefore, while Orkestra would have preferred for me to come on board as a full-timer, we recently agreed that I join as a 70%-time employee, which on most weeks means three long workdays. I think it’s a win-win, as human productivity isn’t a linear function of time spent working – with 70% I’m likely to produce more than I would in the same amount of time as a full-timer. And I have plenty of time off work, which reduces risk factors associated with excessive time dealing with rectangles.\nIn general, I see this sort of flexibility as the future of work in many professions. The forty-hour workweek isn’t sacred – Keynes predicted its demise almost a hundred years ago. With remote and hybrid work becoming the norm in jobs that don’t require in-person presence, employers calling themselves flexible should go beyond remote options.\nA question I get a lot is what I do in my days off work. I guess it’s pretty much the same stuff people do on shorter weekends, but with more time to spare. For example, one area that I’ve had more time to invest in is my involvement with the Reef Life Survey Foundation – I’m helping on several trips and with some technical work. In general, if the Orkestra arrangement sticks in the long term, it should also give me time for open source contributions and skill development. As I noted previously, there’s just so much interesting stuff happening in data science that no single job can cover it all – the FOMO is real! With extra time in the week, I can fight the FOMO more effectively, while still having enough off-rectangle time.\nFinally, what about the work I do with Orkestra? I can’t share much yet, but I can say that I’m learning a lot about the energy space. I hope to post more about it in the future, so please stay tuned.\nRectangles are useful, but we also need time without them. In addition to the mission, Automattic CEO Matt Mullenweg has shared his vision of making Automattic the Berkshire Hathaway of the internet, a goal that I find even less inspiring. ↩︎\nWhile I was aware of the mission when I joined Automattic in 2017, it wasn’t a critical criterion for me. Over the years, I’ve become more conscious of the role online platforms play in destabilising societies. I now believe that it’s important for platforms to acknowledge their responsibilities and delegate power to external regulators, e.g., as Facebook is doing with their Oversight Board (which is still an imperfect solution). ↩︎\nIt’s also an open question whether it’s possible to offset things like the harmful work of the Murdoch press. ↩︎\nI still like the 2018 definition, so hopefully I’m done with defining data science. ↩︎\nAccording to a recent study, data science is seen as an incredibly boring job. Not sexy at all. ↩︎\nDespite this, I wasn’t particularly looking forward to going back to frequent long-haul flights – it was an aspect of Automattic work I never liked. This made the prospect of post-pandemic work with Automattic less appealing, even without considering the climate impact of so much flying. ↩︎\n","wordCount":"2677","inLanguage":"en","image":"https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission.jpg","datePublished":"2022-06-06T00:00:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">The mission matters: Moving to climate tech as a data scientist</h1><div class=post-meta><span title='2022-06-06 00:00:00 +0000 UTC'>June 6, 2022</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission_hu2278fbb0a04afae4f432aacc3e29a944_976086_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission_hu2278fbb0a04afae4f432aacc3e29a944_976086_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission_hu2278fbb0a04afae4f432aacc3e29a944_976086_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission_hu2278fbb0a04afae4f432aacc3e29a944_976086_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission_hu2278fbb0a04afae4f432aacc3e29a944_976086_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission.jpg 3238w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission.jpg alt width=3238 height=1821></figure><div class=post-content><blockquote><p><p>So we are facing the most consequential fork in the road. If we continue as now, we are going to be irreparably going down a course of constant destruction, with much human pain and biodiversity loss. Or we can choose to go in the other direction, a path of reconstruction and regeneration, and at least diminish the negative impacts of climate change to something that is manageable.</p><p>But we can only choose it this decade. Our parents did not have this choice, because they didn’t have the capital, technologies and understanding. And for our children, it will be too late. So this is the decade and we are the generation.</p></p><footer><strong>Christiana Figueres</strong>
+<meta name=keywords content="Automattic,career,climate change,data science,environment,Orkestra,personal,politics,remote work,sustainability"><meta name=description content="Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The mission matters: Moving to climate tech as a data scientist"><meta property="og:description" content="Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/"><meta property="og:image" content="https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2022-06-06T00:00:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission.jpg"><meta name=twitter:title content="The mission matters: Moving to climate tech as a data scientist"><meta name=twitter:description content="Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"The mission matters: Moving to climate tech as a data scientist","item":"https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The mission matters: Moving to climate tech as a data scientist","name":"The mission matters: Moving to climate tech as a data scientist","description":"Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.","keywords":["Automattic","career","climate change","data science","environment","Orkestra","personal","politics","remote work","sustainability"],"articleBody":" So we are facing the most consequential fork in the road. If we continue as now, we are going to be irreparably going down a course of constant destruction, with much human pain and biodiversity loss. Or we can choose to go in the other direction, a path of reconstruction and regeneration, and at least diminish the negative impacts of climate change to something that is manageable.\nBut we can only choose it this decade. Our parents did not have this choice, because they didn’t have the capital, technologies and understanding. And for our children, it will be too late. So this is the decade and we are the generation.\nChristiana Figueres Interview on The Future we Choose (2020) Multiple factors contributed to my decision to leave Automattic last year. One factor was that the company’s mission to “democratize publishing and eCommerce” doesn’t resonate with me:1 First, publishing and eCommerce are already widely accessible. Second, despite decades of increased access to a wide variety of publication tools, global democracy is declining. Third, a corollary of the mission is hosting publications by the likes of News Corp Australia, an organisation that is harming Australian democracy according to former prime ministers from both sides of politics. Fourth, I believe that there are more pressing problems I can spend my time on.2\nOne such problem is the climate crisis. I was fortunate to have spent a small amount of time on it at Automattic, where I co-founded a sustainability employee group and led the company’s first purchases of carbon offsets and removals. However, this was a side gig.3\nWhen I left Automattic, I was hoping to get involved more directly in climate and environmental action. Having recently joined Orkestra – a company whose mission is “to power the world’s energy decision-making” – I figured it’s time to share some thoughts on the climate tech space, along with resources that others may find useful.\nDoing more with my climate obsession I’ve always cared about the environment, but my levels of activity in the area have fluctuated over the years. While it’s no excuse, I suppose that environmental issues often appear too intractable, especially with the growth of human population and of the percentage of humans who live in societies that require continuous economic growth to prosper. Collectively, we still haven’t figured out how to obtain prosperity without growth. Given the magnitude of the problems, even the most influential individuals can only make a relatively small impact on driving solutions.\nStill, being unable to do everything doesn’t mean one should do nothing, as the too-small-to-matter excuse can even be applied at the country level. For example, some people claim that given Australia’s small share of global emissions, it can’t play a significant role in addressing climate change. This conveniently neglects the fact that Australia has one of the highest per-capita carbon footprints in the world, and that it is a major exporter of fossil fuels. Clearly, Australia can do more to help achieve the collective goal of keeping global heating below truly dangerous levels. And doing it intelligently would help prosperity, as Australia is uniquely positioned to become a green energy superpower. Given the results of the last federal election, most Australians fall on the “do more” side of the debate.\nAnyway, I am not a country, but similar logic applies: I can do more as an individual, even though my personal emissions are negligible when compared to the daunting amount emitted by humanity as a whole.\nWhen it comes to climate action, a couple of key milestones for me were in 2015, when I became more aware of how I could divest from fossil fuels, and in early 2020, when the massive fires in Australia made me want to do something. Among other things, this led me to push for climate action within Automattic, as noted above. It also led me to – somewhat obsessively – consume quite a few resources on the topic. Honourable mentions go to Outrage + Optimism, Volts, TIL Climate, and My Climate Journey – many others are sprinkled throughout this article.\nOne outcome of the obsession is that I’m more aware of the impacts of climate change, environmental degradation, and government inaction. Massive fires? Climate change increases their frequency and severity. Global pandemics? Habitat loss and greater human-animal interaction increase their probability, while air pollution increases risk from respiratory infections. Widespread floods? Climate change increases flood impact and over-development on floodplains leads to avoidable suffering. More frequent coral bleaching? Increased emissions lead to ocean heatwaves and acidification, while reduced water quality and overfishing certainly don’t help ocean ecosystems.\nSo yeah, big problems. And one can always more/better to help. But it’s usually possible to also do less or worse. Therefore, I believe in doing more while cutting people slack, as suggested by Sami Grover:\nSo by all means, skip that next beef burger, or take a pass on that cheap flight to Cancún. But then ask yourself how you can magnify the impact of what you do. Are there campaigns or advocacy groups you can join? Can you talk to friends or family about the shifts you are making? Can you influence policy or practices at your place of work or study? Can you identify barriers to action that are preventing others from joining in?\nIn so doing, remember to cut yourself, and those around you, some slack. We are not each on an individual journey to slash our footprint to zero. We are on a collective mission to shift the only true footprint that matters: that of society as a whole.\nClimate tech and its intersections with data science When it comes to doing more, one path that a growing number of people seem to take is getting into climate tech. What is climate tech? Good question. To me, defining it is somewhat reminiscent of attempts to define data science, which I’ve tackled in posts from 2014 to 2018.4 In the same way that data science encompassed things that some people have been doing for decades, climate tech is giving a new name to existing activities. Broadly, I’d say that it’s work on technology to reverse, mitigate, and adapt to anthropogenic climate change.\nAnother parallel I see between data science and climate tech is that many things with tenuous connections to the field get lumped into it, in an attempt to capitalise on its trendiness. I think we’re past the peak of the data science hype, but there was a time when people who had only taken cursory looks at data rebranded as data scientists. Similarly, there are “climate tech” companies out there that may have a negative or neutral impact on fighting climate change. Personally, I’m also skeptical of grouping adaptation efforts under climate tech. For example, dealing with extreme weather events is needed even in a world with a stable climate, so I don’t think such work captures the intention behind climate tech (though it can be valuable).\nMost importantly, no matter how you define climate tech and data science, there is a need for data skills to develop technologies that address climate change. And this is where data scientists who are concerned about the climate (like me) can help make difference. In the words of Saul Griffith:\nIf you are a tech worker, stop making social media and delivery apps and make software that helps people use less energy, balances the grid, automates the design of solar and wind plants, makes public transit work better, and does other useful things to accelerate our transition to renewables.\nA structured approach to making career decisions I’m fortunate to have skills that are in demand in the current market. I’m also fortunate to be in a financial position that allows me to take unpaid time off. Put together, this means that I have a high degree of freedom to choose how I spend my time.\nIn the past, I’ve advocated for asking why about every career step. And indeed, I can explain the reasoning behind every point in my resume. Sometimes, a step is due to dumb luck, e.g., I discovered that I was a data scientist in 2012, the year Harvard Business Review deemed it the sexiest job of the 21st century5 – I didn’t plan to become a data scientist when I started my PhD in 2009. And sometimes, a step is more planned – I specifically targeted Automattic as one of the few established fully-remote companies that was hiring data scientists in 2017, as my goals included living outside major cities and having a job that I can hold for more than a year without wanting to run away.\nGiven that my current position presents more options than I’ve had in the past, I decided to have a look through 80,000 Hours. I’ve been aware of their work for years, but my vague impression was that they’re overly utilitarian. However, digging through their resources, I found that they emphasise the importance of personal fit and well-being, both when it comes to career paths and to problem areas. For example, they aren’t too pushy about choosing the problems that they find most pressing if it doesn’t align with one’s beliefs and values.\nThe 80,000 Hours website contains a wealth of well-reasoned articles. I found the self-guided course on career planning useful to go through, as it helped me apply their main ideas to my situation. While I don’t feel like it led to a major shift in my views and plans, having more structure and a richer terminology to think through my career decisions is helpful.\nThat said, one area where I diverge from the 80,000 Hours philosophy is in concern about far-future human extinction. They conclude that climate change is less recommended than other problems as the odds of it leading to human extinction are low. However, working in the climate space should alleviate human suffering in this century and reduce the extinction risks of nonhuman animals. Both of these are important to me, especially given the rich cultural lives of animals like whales and dolphins.\nWhales have cultures and massive brains, but perhaps you don’t care. How I ended up with Orkestra In retrospect, my ~4.5 years at Automattic could be divided into the pre-pandemic and pandemic periods. Pre-pandemic, I got to travel a few times a year to meet my colleagues in person. From the time the pandemic hit, this wasn’t an option. While I was lucky to be with a company that had already figured out how to work remotely, I found the complete lack of in-person interaction with my colleagues to be too isolating and monotonous.6 Together with the pandemic-era stressors that affected pretty much everything, I felt that Automattic had become a less pleasant place to work.\nWhen reflecting on my decision to leave, I realised that I had experienced two of the three dimensions of occupational burnout: I was high on exhaustion and cynicism, but felt like I still had professional efficacy. As I take pride in doing good work, I was concerned about losing my sense of efficacy and burning out on all three fronts. It was definitely time to leave, especially since burnout is seen by researchers as “a sign of a major dysfunction within an organization [that] says more about the workplace than it does about the employees”.\nGiven my recent burnout experience, I was reluctant to jump into a full-time job. I took some time to relax, and worked on side projects like getting my website off WordPress.com and developing a web app for fish identification. Concurrently, I was also looking to learn more about the climate tech space. I was already a member of the Climate Action Tech community and a consumer of various other climate-related resources, but my search had also led me to places like the Climate People agency and the Work on Climate community. Looking through these resources became a part of my routine, and it was on the Climate Action Tech Slack that I saw a short message by Chris Cooper, advertising open positions at Orkestra (then called Vippy).\nFrom the time I decided to enquire, things moved quickly. By early February, we agreed to engage in a short-term contract where I would do data science work for three days per week. This was largely because I wanted to keep my options open and avoid over-committing myself, especially after the burnout I experienced at Automattic.\nWhile the original plan was to use the contract as a trial towards full-time employment, I found that I enjoyed working only three focused days on Orkestra. It was a refreshing change from the sort of work I was doing at Automattic – perhaps a similar feeling to that of a former Automattic employee who moved to DuckDuckGo: “the big shift was to an all-business-low-drama environment, meaning that my job was cognitively harder but emotionally easier”.\nTherefore, while Orkestra would have preferred for me to come on board as a full-timer, we recently agreed that I join as a 70%-time employee, which on most weeks means three long workdays. I think it’s a win-win, as human productivity isn’t a linear function of time spent working – with 70% I’m likely to produce more than I would in the same amount of time as a full-timer. And I have plenty of time off work, which reduces risk factors associated with excessive time dealing with rectangles.\nIn general, I see this sort of flexibility as the future of work in many professions. The forty-hour workweek isn’t sacred – Keynes predicted its demise almost a hundred years ago. With remote and hybrid work becoming the norm in jobs that don’t require in-person presence, employers calling themselves flexible should go beyond remote options.\nA question I get a lot is what I do in my days off work. I guess it’s pretty much the same stuff people do on shorter weekends, but with more time to spare. For example, one area that I’ve had more time to invest in is my involvement with the Reef Life Survey Foundation – I’m helping on several trips and with some technical work. In general, if the Orkestra arrangement sticks in the long term, it should also give me time for open source contributions and skill development. As I noted previously, there’s just so much interesting stuff happening in data science that no single job can cover it all – the FOMO is real! With extra time in the week, I can fight the FOMO more effectively, while still having enough off-rectangle time.\nFinally, what about the work I do with Orkestra? I can’t share much yet, but I can say that I’m learning a lot about the energy space. I hope to post more about it in the future, so please stay tuned.\nRectangles are useful, but we also need time without them. In addition to the mission, Automattic CEO Matt Mullenweg has shared his vision of making Automattic the Berkshire Hathaway of the internet, a goal that I find even less inspiring. ↩︎\nWhile I was aware of the mission when I joined Automattic in 2017, it wasn’t a critical criterion for me. Over the years, I’ve become more conscious of the role online platforms play in destabilising societies. I now believe that it’s important for platforms to acknowledge their responsibilities and delegate power to external regulators, e.g., as Facebook is doing with their Oversight Board (which is still an imperfect solution). ↩︎\nIt’s also an open question whether it’s possible to offset things like the harmful work of the Murdoch press. ↩︎\nI still like the 2018 definition, so hopefully I’m done with defining data science. ↩︎\nAccording to a recent study, data science is seen as an incredibly boring job. Not sexy at all. ↩︎\nDespite this, I wasn’t particularly looking forward to going back to frequent long-haul flights – it was an aspect of Automattic work I never liked. This made the prospect of post-pandemic work with Automattic less appealing, even without considering the climate impact of so much flying. ↩︎\n","wordCount":"2677","inLanguage":"en","image":"https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission.jpg","datePublished":"2022-06-06T00:00:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">The mission matters: Moving to climate tech as a data scientist</h1><div class=post-meta><span title='2022-06-06 00:00:00 +0000 UTC'>June 6, 2022</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission_hu2278fbb0a04afae4f432aacc3e29a944_976086_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission_hu2278fbb0a04afae4f432aacc3e29a944_976086_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission_hu2278fbb0a04afae4f432aacc3e29a944_976086_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission_hu2278fbb0a04afae4f432aacc3e29a944_976086_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission_hu2278fbb0a04afae4f432aacc3e29a944_976086_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission.jpg 3238w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/dolphin-on-a-mission.jpg alt width=3238 height=1821></figure><div class=post-content><blockquote><p><p>So we are facing the most consequential fork in the road. If we continue as now, we are going to be irreparably going down a course of constant destruction, with much human pain and biodiversity loss. Or we can choose to go in the other direction, a path of reconstruction and regeneration, and at least diminish the negative impacts of climate change to something that is manageable.</p><p>But we can only choose it this decade. Our parents did not have this choice, because they didn’t have the capital, technologies and understanding. And for our children, it will be too late. So this is the decade and we are the generation.</p></p><footer><strong>Christiana Figueres</strong>
 <cite><a href=https://www.theguardian.com/environment/2020/feb/15/christiana-figueres-climate-emergency-this-is-the-decade-the-future-we-choose title=https://www.theguardian.com/environment/2020/feb/15/christiana-figueres-climate-emergency-this-is-the-decade-the-future-we-choose target=_blank rel=noopener>Interview on The Future we Choose (2020)</a></cite></footer></blockquote><p>Multiple factors contributed to my decision to leave Automattic last year. One factor was that <a href=https://transparency.automattic.com/ target=_blank rel=noopener>the company&rsquo;s mission to <em>&ldquo;democratize publishing and eCommerce&rdquo;</em></a> doesn&rsquo;t resonate with me:<sup id=fnref:1><a href=#fn:1 class=footnote-ref role=doc-noteref>1</a></sup> First, publishing and eCommerce are already widely accessible. Second, despite decades of increased access to a wide variety of publication tools, <a href=https://www.economist.com/graphic-detail/2022/02/09/a-new-low-for-global-democracy target=_blank rel=noopener>global democracy is declining</a>. Third, a corollary of the mission is <a href=https://wpvip.com/case-studies/unlocking-power-and-efficiency-for-news-corp-australia/ target=_blank rel=noopener>hosting publications by the likes of News Corp Australia</a>, an organisation <a href=https://www.theguardian.com/media/commentisfree/2021/oct/29/news-corp-opponents-team-up-to-fight-cancer-on-democracy target=_blank rel=noopener>that is harming Australian democracy according to former prime ministers from both sides of politics</a>. Fourth, I believe that there are more pressing problems I can spend my time on.<sup id=fnref:2><a href=#fn:2 class=footnote-ref role=doc-noteref>2</a></sup></p><p>One such problem is the climate crisis. I was fortunate to have spent a small amount of time on it at Automattic, where <a href=https://wordpress.com/blog/2020/09/21/toward-zero-reducing-and-offsetting-our-data-center-power-emissions/ target=_blank rel=noopener>I co-founded a sustainability employee group and led the company&rsquo;s first purchases of carbon offsets and removals</a>. However, this was a side gig.<sup id=fnref:3><a href=#fn:3 class=footnote-ref role=doc-noteref>3</a></sup></p><p>When I left Automattic, I was hoping to get involved more directly in climate and environmental action. Having recently joined <a href=https://www.orkestra.energy/ target=_blank rel=noopener>Orkestra</a> – a company whose mission is <em>&ldquo;to power the world&rsquo;s energy decision-making&rdquo;</em> – I figured it&rsquo;s time to share some thoughts on the climate tech space, along with resources that others may find useful.</p><h2 id=doing-more-with-my-climate-obsession>Doing more with my climate obsession<a hidden class=anchor aria-hidden=true href=#doing-more-with-my-climate-obsession>#</a></h2><p>I&rsquo;ve always cared about the environment, but my levels of activity in the area have fluctuated over the years. While it&rsquo;s no excuse, I suppose that environmental issues often appear too intractable, especially with the growth of human population and of the percentage of humans who live in societies that require continuous economic growth to prosper. Collectively, we still haven&rsquo;t figured out how to obtain <a href=https://en.wikipedia.org/wiki/Prosperity_Without_Growth target=_blank rel=noopener>prosperity without growth</a>. Given the magnitude of the problems, even the most influential individuals can only make a relatively small impact on driving solutions.</p><p>Still, being unable to do <em>everything</em> doesn&rsquo;t mean one should do <em>nothing</em>, as the too-small-to-matter excuse can even be applied at the country level. For example, <a href=https://theconversation.com/how-to-answer-the-argument-that-australias-emissions-are-too-small-to-make-a-difference-118825 target=_blank rel=noopener>some people claim that given Australia&rsquo;s small share of global emissions, it can&rsquo;t play a significant role in addressing climate change</a>. This conveniently neglects the fact that Australia has one of the highest per-capita carbon footprints in the world, and that <a href=https://australiainstitute.org.au/post/new-analysis-australia-ranks-third-for-fossil-fuel-export/ target=_blank rel=noopener>it is a major exporter of fossil fuels</a>. Clearly, Australia can do more to help achieve the collective goal of keeping global heating below truly dangerous levels. And doing it intelligently would help prosperity, as <a href=https://www.blackincbooks.com.au/books/superpower target=_blank rel=noopener>Australia is uniquely positioned to become a green energy superpower</a>. Given <a href=https://en.wikipedia.org/wiki/2022_Australian_federal_election target=_blank rel=noopener>the results of the last federal election</a>, most Australians fall on the &ldquo;do more&rdquo; side of the debate.</p><p>Anyway, I am not a country, but similar logic applies: I can do more as an individual, even though my personal emissions are negligible when compared to the daunting amount emitted by humanity as a whole.</p><p>When it comes to climate action, a couple of key milestones for me were in 2015, when <a href=https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/>I became more aware of how I could divest from fossil fuels</a>, and in early 2020, when the massive fires in Australia made me want to do <em>something</em>. Among other things, this led me to push for climate action within Automattic, as noted above. It also led me to – somewhat obsessively – consume quite a few resources on the topic. Honourable mentions go to <a href=https://www.outrageandoptimism.org/ target=_blank rel=noopener>Outrage + Optimism</a>, <a href=https://www.volts.wtf/ target=_blank rel=noopener>Volts</a>, <a href=https://climate.mit.edu/tilclimate-podcast target=_blank rel=noopener>TIL Climate</a>, and <a href=https://www.mcjcollective.com/ target=_blank rel=noopener>My Climate Journey</a> – many others are sprinkled throughout this article.</p><p>One outcome of the obsession is that I&rsquo;m more aware of the impacts of climate change, environmental degradation, and government inaction. Massive fires? <a href=https://www.climatecouncil.org.au/not-normal-climate-change-bushfire-web/ target=_blank rel=noopener>Climate change increases their frequency and severity</a>. Global pandemics? <a href=https://www.pnas.org/doi/10.1073/pnas.2023540118 target=_blank rel=noopener>Habitat loss and greater human-animal interaction increase their probability</a>, while <a href=https://www.hsph.harvard.edu/c-change/subtopics/coronavirus-and-pollution/ target=_blank rel=noopener>air pollution increases risk from respiratory infections</a>. Widespread floods? <a href=https://www.ipcc.ch/2021/08/09/ar6-wg1-20210809-pr/ target=_blank rel=noopener>Climate change increases flood impact</a> and <a href=https://www.uqp.com.au/books/a-river-with-a-city-problem-a-history-of-brisbane-floods target=_blank rel=noopener>over-development on floodplains leads to avoidable suffering</a>. More frequent coral bleaching? <a href=https://en.wikipedia.org/wiki/Coral_bleaching#Triggers target=_blank rel=noopener>Increased emissions lead to ocean heatwaves and acidification</a>, while <a href=https://www.barrierreef.org/the-reef/threats/poor-water-quality target=_blank rel=noopener>reduced water quality and overfishing certainly don&rsquo;t help ocean ecosystems</a>.</p><p>So yeah, big problems. And one can always more/better to help. But it&rsquo;s usually possible to also do less or worse. Therefore, I believe in doing more while cutting people slack, <a href=https://undark.org/2021/09/09/the-messy-truth-about-carbon-footprints/ target=_blank rel=noopener>as suggested by Sami Grover</a>:</p><blockquote><p>So by all means, skip that next beef burger, or take a pass on that cheap flight to Cancún. But then ask yourself how you can magnify the impact of what you do. Are there campaigns or advocacy groups you can join? Can you talk to friends or family about the shifts you are making? Can you influence policy or practices at your place of work or study? Can you identify barriers to action that are preventing others from joining in?</p><p>In so doing, remember to cut yourself, and those around you, some slack. We are not each on an individual journey to slash our footprint to zero. We are on a collective mission to shift the only true footprint that matters: that of society as a whole.</p></blockquote><h2 id=climate-tech-and-its-intersections-with-data-science>Climate tech and its intersections with data science<a hidden class=anchor aria-hidden=true href=#climate-tech-and-its-intersections-with-data-science>#</a></h2><p>When it comes to doing more, one path that a growing number of people seem to take is getting into climate tech. What is climate tech? <a href=https://workonclimate.org/2022/03/25/climate-workforce-insights/ target=_blank rel=noopener>Good question</a>. To me, defining it is somewhat reminiscent of attempts to define data science, which I&rsquo;ve tackled in posts from <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>2014</a> to <a href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/>2018</a>.<sup id=fnref:4><a href=#fn:4 class=footnote-ref role=doc-noteref>4</a></sup> In the same way that data science encompassed things that some people have been doing for decades, climate tech is giving a new name to existing activities. Broadly, I&rsquo;d say that it&rsquo;s work on technology to reverse, mitigate, and adapt to anthropogenic climate change.</p><p>Another parallel I see between data science and climate tech is that many things with tenuous connections to the field get lumped into it, in an attempt to capitalise on its trendiness. I think we&rsquo;re past the peak of the data science hype, but there was a time when people who had only taken cursory looks at data rebranded as data scientists. Similarly, there are &ldquo;climate tech&rdquo; companies out there that may have a negative or neutral impact on fighting climate change. Personally, I&rsquo;m also skeptical of grouping adaptation efforts under climate tech. For example, dealing with extreme weather events is needed even in a world with a stable climate, so I don&rsquo;t think such work captures the intention behind climate tech (though it can be valuable).</p><p>Most importantly, no matter how you define climate tech and data science, <strong>there is a need for data skills to develop technologies that address climate change.</strong> And this is where data scientists who are concerned about the climate (like me) can help make difference. In <a href=https://mitpress.mit.edu/books/electrify target=_blank rel=noopener>the words of Saul Griffith</a>:</p><blockquote><p>If you are a tech worker, stop making social media and delivery apps and make software that helps people use less energy, balances the grid, automates the design of solar and wind plants, makes public transit work better, and does other useful things to accelerate our transition to renewables.</p></blockquote><h2 id=a-structured-approach-to-making-career-decisions>A structured approach to making career decisions<a hidden class=anchor aria-hidden=true href=#a-structured-approach-to-making-career-decisions>#</a></h2><p>I&rsquo;m fortunate to have skills that are in demand in the current market. I&rsquo;m also fortunate to be in a financial position that allows me to take unpaid time off. Put together, this means that I have a high degree of freedom to choose how I spend my time.</p><p>In the past, <a href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/>I&rsquo;ve advocated for asking <em>why</em> about every career step</a>. And indeed, I can explain the reasoning behind every point in my resume. Sometimes, a step is due to dumb luck, e.g., I discovered that I was a data scientist in 2012, the year Harvard Business Review <a href=https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century target=_blank rel=noopener>deemed it</a> <em>the sexiest job of the 21st century</em><sup id=fnref:5><a href=#fn:5 class=footnote-ref role=doc-noteref>5</a></sup> – I didn&rsquo;t plan to become a data scientist when I started my PhD in 2009. And sometimes, a step is more planned – I specifically targeted Automattic as one of the few established fully-remote companies that was hiring data scientists in 2017, as my goals included living outside major cities and having a job that I can hold for more than a year without wanting to run away.</p><p>Given that my current position presents more options than I&rsquo;ve had in the past, I decided to <a href=https://80000hours.org/ target=_blank rel=noopener>have a look through 80,000 Hours</a>. I&rsquo;ve been aware of their work for years, but my vague impression was that they&rsquo;re overly utilitarian. However, digging through their resources, I found that they emphasise the importance of personal fit and well-being, both when it comes to career paths and to problem areas. For example, they aren&rsquo;t too pushy about choosing the problems that they find most pressing if it doesn&rsquo;t align with one&rsquo;s beliefs and values.</p><p>The 80,000 Hours website contains a wealth of well-reasoned articles. I found <a href=https://80000hours.org/career-planning/ target=_blank rel=noopener>the self-guided course on career planning</a> useful to go through, as it helped me apply their main ideas to my situation. While I don&rsquo;t feel like it led to a major shift in my views and plans, having more structure and a richer terminology to think through my career decisions is helpful.</p><p>That said, one area where I diverge from the 80,000 Hours philosophy is in concern about far-future human extinction. They conclude that <a href=https://80000hours.org/problem-profiles/climate-change/ target=_blank rel=noopener>climate change is less recommended than other problems</a> as the odds of it leading to human extinction are low. However, working in the climate space should alleviate human suffering in this century and reduce the extinction risks of nonhuman animals. Both of these are important to me, especially given <a href=https://press.uchicago.edu/ucp/books/book/chicago/C/bo12789830.html target=_blank rel=noopener>the rich cultural lives of animals like whales and dolphins</a>.</p><figure><a href=https://poorlydrawnlines.com/comic/the-whales/ target=_blank rel=noopener><img sizes="(min-width: 768px) 700px,
 100vw" srcset="https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/the-whales-poorlydrawnlines_hu266a6c9caa5760a80789194dfbd8f4db_167546_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/the-whales-poorlydrawnlines_hu266a6c9caa5760a80789194dfbd8f4db_167546_480x0_resize_box_3.png 480w,
diff --git a/2022/09/12/causal-machine-learning-book-draft-review/index.html b/2022/09/12/causal-machine-learning-book-draft-review/index.html
index 63d86387d..a7cb243da 100644
--- a/2022/09/12/causal-machine-learning-book-draft-review/index.html
+++ b/2022/09/12/causal-machine-learning-book-draft-review/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Causal Machine Learning is off to a good start, despite some issues | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,causal inference,data science,machine learning"><meta name=description content="Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Causal Machine Learning is off to a good start, despite some issues"><meta property="og:description" content="Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/"><meta property="og:image" content="https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/dall-e-a-steampunk-painting-of-a-data-scientist-reading-a-book-about-causal-machine-learning.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2022-09-12T02:45:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/dall-e-a-steampunk-painting-of-a-data-scientist-reading-a-book-about-causal-machine-learning.png"><meta name=twitter:title content="Causal Machine Learning is off to a good start, despite some issues"><meta name=twitter:description content="Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Causal Machine Learning is off to a good start, despite some issues","item":"https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Causal Machine Learning is off to a good start, despite some issues","name":"Causal Machine Learning is off to a good start, despite some issues","description":"Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.","keywords":["artificial intelligence","causal inference","data science","machine learning"],"articleBody":"I was recently given a free eBook copy of the MEAP of Causal Machine Learning. MEAP stands for Manning Early Access Program, where books are published one chapter at a time. While the current version could use better copyediting and proofreading, I’m keen on reading more of the book as it becomes available.\nCausal Machine Learning addresses a gap in the causal inference literature: While much has been published on the topic, putting the theory to practice in the real world can be challenging. For example, even though I considered Causal Inference: What If to be the most practical book I’ve read on the topic, I haven’t used much of its content directly. This is partly due to my focus on other areas, e.g., online experimentation and the energy space. But it is also due to the availability of sample code and mature packages that can be quickly adapted to my needs. The book aims to address the latter through a code-first approach that utilises Python packages such as Pyro, pgmpy, and DoWhy.\nDespite the code-first promise, the book feels a bit slow at getting into the more exciting content. I couldn’t help but compare it to the fast.ai book, which first shows how to build and deploy a custom image classifier, and only then goes into unpacking how it all works. However, despite the verbosity of the first two chapters, by the third chapter things start to get more interesting. At the time of this writing, only chapters 1-3 are available, but upcoming chapters look promising based on the table of contents.\nWhile lacking a production-ready example early in the book is a minor concern, I found the many grammatical errors more distracting. Even though a MEAP is essentially a draft, I think its proofreading level should be higher than that of a blog post.1 This is especially the case for paid content published by an organisation that cares enough to have contacted me to promote the book. As Steven Pinker says in the intro to The Sense of Style:\nStyle earns trust. If readers can see that a writer cares about consistency and accuracy in her prose, they will be reassured that the writer cares about those virtues in conduct they cannot see as easily. Here is how one technology executive explains why he rejects job applications filled with errors of grammar and punctuation: “If it takes someone more than 20 years to notice how to properly use it’s, then that’s not a learning curve I’m comfortable with.” And if that isn’t enough to get you to brush up your prose, consider the discovery of the dating site OkCupid that sloppy grammar and spelling in a profile are “huge turn-offs.” As one client said, “If you’re trying to date a woman, I don’t expect flowery Jane Austen prose. But aren’t you trying to put your best foot forward?”\nAnother source of distraction is the choice of variables for some of the toy examples. For instance, one model of blood type inheritance confuses the phenotype and genotype, claiming that “knowing your grandfather’s [blood] type has no benefit in predicting your type once we know your father’s”. However, knowing the grandparents’ blood types can help predict the grandchild’s blood type even when the parent’s blood type is known. The toy example would work if it focused on genotypes, not on the common meaning of blood type as the phenotype (i.e., observable traits). See pages 58-60 in Probabilistic Graphical Models: Principles and Techniques for a less casual presentation of a similar example.\nWhen observing parent phenotypes (ABO blood types) without genotypes, grandparent phenotypes are informative.\nSource: Wikipedia – ABO blood group system (retrieved on 2022-09-11). I also struggle with overly-casual statements like this one:\nSuppose we were interested in modeling the relationship between altitude and temperature. The two are clearly correlated; the higher up you go, the colder it gets. However, you know temperature doesn’t cause altitude, otherwise heating the air within a city would cause the city to fly. Altitude is the cause, and temperature is the effect.\nIn fact, heating the air within a city would cause the heated air to rise. And extremely high heat can melt a city and the land it’s on, thereby causing a reduction in its altitude.\nWhile this may seem like nitpicking, ill-defined causal graphs are a serious problem. One of my favourite papers on the topic is Does water kill? A call for less casual causal inferences, which argues that \"[while] it is impossible to provide an absolutely precise definition of a version of treatment […] specification of versions of treatment is required only until no meaningful vagueness remains\". However, “declaring a version of treatment sufficiently well-defined is a matter of agreement among experts based on the available substantive knowledge” because we don’t have an objective way of determining that treatments are well-defined. In line with this thinking, the book may benefit from reducing the variety of examples in favour of a handful of small datasets that are more well-defined and defensible.\nDespite these shortcomings, I found chapters 1-3 of Causal Machine Learning pleasant enough to get through, and I look forward to reading more. Getting into DoWhy and other related packages has been on my list, and I’m sure I’ll learn a lot by following the MEAP. After tracking the field for almost a decade and complaining about the relative hype levels of deep learning and causal inference, it’s great to see a practical book that aims to marry the two. The Causal Revolution is truly upon us.\nIt is almost inevitable that when pointing out the mistakes of others I will make mistakes myself. I apologise for any mistakes and welcome feedback. ↩︎\n","wordCount":"952","inLanguage":"en","image":"https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/dall-e-a-steampunk-painting-of-a-data-scientist-reading-a-book-about-causal-machine-learning.png","datePublished":"2022-09-12T02:45:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Causal Machine Learning is off to a good start, despite some issues</h1><div class=post-meta><span title='2022-09-12 02:45:00 +0000 UTC'>September 12, 2022</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/_hu29d7aa7a71e422617b3694161c10ba7c_416864_b46c591c1871236fdf504c5fb3243cd3.png 360w ,https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/_hu29d7aa7a71e422617b3694161c10ba7c_416864_3c78bc66e8a0ee5c04cfc469e70687b1.png 480w ,https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/dall-e-a-steampunk-painting-of-a-data-scientist-reading-a-book-about-causal-machine-learning.png 500w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/dall-e-a-steampunk-painting-of-a-data-scientist-reading-a-book-about-causal-machine-learning.png alt="[DALL·E](https://labs.openai.com/)'s _steampunk painting of a data scientist reading a book about causal machine learning_." width=500 height=492><p><a href=https://labs.openai.com/ target=_blank rel=noopener>DALL·E</a>&rsquo;s <em>steampunk painting of a data scientist reading a book about causal machine learning</em>.</p></figure><div class=post-content><p>I was recently given a free eBook copy of the MEAP of <a href=https://www.manning.com/books/causal-machine-learning target=_blank rel=noopener><em>Causal Machine Learning</em></a>. <a href=https://www.manning.com/meap-program target=_blank rel=noopener>MEAP stands for Manning Early Access Program</a>, where books are published one chapter at a time. While the current version could use better copyediting and proofreading, I&rsquo;m keen on reading more of the book as it becomes available.</p><p><em>Causal Machine Learning</em> addresses a gap in the causal inference literature: While <a href=https://yanirseroussi.com/causal-inference-reading-list/>much has been published on the topic</a>, putting the theory to practice in the real world can be challenging. For example, even though I considered <a href=https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ target=_blank rel=noopener><em>Causal Inference: What If</em></a> to be <a href=https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/>the most practical book I&rsquo;ve read on the topic</a>, I haven&rsquo;t used much of its content directly. This is partly due to my focus on other areas, e.g., <a href=https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/>online experimentation</a> and <a href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/>the energy space</a>. But it is also due to the availability of sample code and mature packages that can be quickly adapted to my needs. The book aims to address the latter through a code-first approach that utilises Python packages such as <a href=https://pyro.ai/ target=_blank rel=noopener>Pyro</a>, <a href=https://pgmpy.org/ target=_blank rel=noopener>pgmpy</a>, and <a href=https://py-why.github.io/dowhy/ target=_blank rel=noopener>DoWhy</a>.</p><p>Despite the code-first promise, the book feels a bit slow at getting into the more exciting content. I couldn&rsquo;t help but compare it to <a href=https://github.com/fastai/fastbook target=_blank rel=noopener>the fast.ai book</a>, which first shows how to build and deploy a custom image classifier, and only then goes into unpacking how it all works. However, despite the verbosity of the first two chapters, by the third chapter things start to get more interesting. At the time of this writing, only chapters 1-3 are available, but upcoming chapters look promising based on the table of contents.</p><p>While lacking a production-ready example early in the book is a minor concern, I found the many grammatical errors more distracting. Even though a MEAP is essentially a draft, I think its proofreading level should be higher than that of a blog post.<sup id=fnref:1><a href=#fn:1 class=footnote-ref role=doc-noteref>1</a></sup> This is especially the case for paid content published by an organisation that cares enough to have contacted me to promote the book. As Steven Pinker says in the intro to <a href=https://stevenpinker.com/publications/sense-style-thinking-persons-guide-writing-21st-century target=_blank rel=noopener>The Sense of Style</a>:</p><blockquote><p>Style earns trust. If readers can see that a writer cares about consistency and accuracy in her prose, they will be reassured that the writer cares about those virtues in conduct they cannot see as easily. Here is how one technology executive explains why he rejects job applications filled with errors of grammar and punctuation: &ldquo;If it takes someone more than 20 years to notice how to properly use it&rsquo;s, then that&rsquo;s not a learning curve I&rsquo;m comfortable with.&rdquo; And if that isn&rsquo;t enough to get you to brush up your prose, consider the discovery of the dating site OkCupid that sloppy grammar and spelling in a profile are &ldquo;huge turn-offs.&rdquo; As one client said, &ldquo;If you&rsquo;re trying to date a woman, I don&rsquo;t expect flowery Jane Austen prose. But aren&rsquo;t you trying to put your best foot forward?&rdquo;</p></blockquote><p>Another source of distraction is the choice of variables for some of the toy examples. For instance, one model of blood type inheritance confuses the phenotype and genotype, claiming that <em>&ldquo;knowing your grandfather&rsquo;s [blood] type has no benefit in predicting your type once we know your father&rsquo;s&rdquo;</em>. However, <a href=https://en.wikipedia.org/wiki/ABO_blood_group_system#Genetics target=_blank rel=noopener>knowing the grandparents&rsquo; blood types <em>can</em> help predict the grandchild&rsquo;s blood type even when the parent&rsquo;s blood type is known</a>. The toy example would work if it focused on genotypes, not on the common meaning of <em>blood type</em> as the phenotype (i.e., observable traits). See <a href="https://books.google.com.au/books?id=7dzpHCHzNQ4C&amp;lpg=PA59&amp;ots=px4BFm4XAP&amp;pg=PA58#v=onepage&amp;q&amp;f=false" target=_blank rel=noopener>pages 58-60 in Probabilistic Graphical Models: Principles and Techniques</a> for a less casual presentation of a similar example.</p><figure><a href=wikipedia-blood-group-inheritance-table.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="artificial intelligence,causal inference,data science,machine learning"><meta name=description content="Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Causal Machine Learning is off to a good start, despite some issues"><meta property="og:description" content="Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/"><meta property="og:image" content="https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/dall-e-a-steampunk-painting-of-a-data-scientist-reading-a-book-about-causal-machine-learning.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2022-09-12T02:45:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/dall-e-a-steampunk-painting-of-a-data-scientist-reading-a-book-about-causal-machine-learning.png"><meta name=twitter:title content="Causal Machine Learning is off to a good start, despite some issues"><meta name=twitter:description content="Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Causal Machine Learning is off to a good start, despite some issues","item":"https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Causal Machine Learning is off to a good start, despite some issues","name":"Causal Machine Learning is off to a good start, despite some issues","description":"Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.","keywords":["artificial intelligence","causal inference","data science","machine learning"],"articleBody":"I was recently given a free eBook copy of the MEAP of Causal Machine Learning. MEAP stands for Manning Early Access Program, where books are published one chapter at a time. While the current version could use better copyediting and proofreading, I’m keen on reading more of the book as it becomes available.\nCausal Machine Learning addresses a gap in the causal inference literature: While much has been published on the topic, putting the theory to practice in the real world can be challenging. For example, even though I considered Causal Inference: What If to be the most practical book I’ve read on the topic, I haven’t used much of its content directly. This is partly due to my focus on other areas, e.g., online experimentation and the energy space. But it is also due to the availability of sample code and mature packages that can be quickly adapted to my needs. The book aims to address the latter through a code-first approach that utilises Python packages such as Pyro, pgmpy, and DoWhy.\nDespite the code-first promise, the book feels a bit slow at getting into the more exciting content. I couldn’t help but compare it to the fast.ai book, which first shows how to build and deploy a custom image classifier, and only then goes into unpacking how it all works. However, despite the verbosity of the first two chapters, by the third chapter things start to get more interesting. At the time of this writing, only chapters 1-3 are available, but upcoming chapters look promising based on the table of contents.\nWhile lacking a production-ready example early in the book is a minor concern, I found the many grammatical errors more distracting. Even though a MEAP is essentially a draft, I think its proofreading level should be higher than that of a blog post.1 This is especially the case for paid content published by an organisation that cares enough to have contacted me to promote the book. As Steven Pinker says in the intro to The Sense of Style:\nStyle earns trust. If readers can see that a writer cares about consistency and accuracy in her prose, they will be reassured that the writer cares about those virtues in conduct they cannot see as easily. Here is how one technology executive explains why he rejects job applications filled with errors of grammar and punctuation: “If it takes someone more than 20 years to notice how to properly use it’s, then that’s not a learning curve I’m comfortable with.” And if that isn’t enough to get you to brush up your prose, consider the discovery of the dating site OkCupid that sloppy grammar and spelling in a profile are “huge turn-offs.” As one client said, “If you’re trying to date a woman, I don’t expect flowery Jane Austen prose. But aren’t you trying to put your best foot forward?”\nAnother source of distraction is the choice of variables for some of the toy examples. For instance, one model of blood type inheritance confuses the phenotype and genotype, claiming that “knowing your grandfather’s [blood] type has no benefit in predicting your type once we know your father’s”. However, knowing the grandparents’ blood types can help predict the grandchild’s blood type even when the parent’s blood type is known. The toy example would work if it focused on genotypes, not on the common meaning of blood type as the phenotype (i.e., observable traits). See pages 58-60 in Probabilistic Graphical Models: Principles and Techniques for a less casual presentation of a similar example.\nWhen observing parent phenotypes (ABO blood types) without genotypes, grandparent phenotypes are informative.\nSource: Wikipedia – ABO blood group system (retrieved on 2022-09-11). I also struggle with overly-casual statements like this one:\nSuppose we were interested in modeling the relationship between altitude and temperature. The two are clearly correlated; the higher up you go, the colder it gets. However, you know temperature doesn’t cause altitude, otherwise heating the air within a city would cause the city to fly. Altitude is the cause, and temperature is the effect.\nIn fact, heating the air within a city would cause the heated air to rise. And extremely high heat can melt a city and the land it’s on, thereby causing a reduction in its altitude.\nWhile this may seem like nitpicking, ill-defined causal graphs are a serious problem. One of my favourite papers on the topic is Does water kill? A call for less casual causal inferences, which argues that \"[while] it is impossible to provide an absolutely precise definition of a version of treatment […] specification of versions of treatment is required only until no meaningful vagueness remains\". However, “declaring a version of treatment sufficiently well-defined is a matter of agreement among experts based on the available substantive knowledge” because we don’t have an objective way of determining that treatments are well-defined. In line with this thinking, the book may benefit from reducing the variety of examples in favour of a handful of small datasets that are more well-defined and defensible.\nDespite these shortcomings, I found chapters 1-3 of Causal Machine Learning pleasant enough to get through, and I look forward to reading more. Getting into DoWhy and other related packages has been on my list, and I’m sure I’ll learn a lot by following the MEAP. After tracking the field for almost a decade and complaining about the relative hype levels of deep learning and causal inference, it’s great to see a practical book that aims to marry the two. The Causal Revolution is truly upon us.\nIt is almost inevitable that when pointing out the mistakes of others I will make mistakes myself. I apologise for any mistakes and welcome feedback. ↩︎\n","wordCount":"952","inLanguage":"en","image":"https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/dall-e-a-steampunk-painting-of-a-data-scientist-reading-a-book-about-causal-machine-learning.png","datePublished":"2022-09-12T02:45:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Causal Machine Learning is off to a good start, despite some issues</h1><div class=post-meta><span title='2022-09-12 02:45:00 +0000 UTC'>September 12, 2022</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/_hu29d7aa7a71e422617b3694161c10ba7c_416864_b46c591c1871236fdf504c5fb3243cd3.png 360w ,https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/_hu29d7aa7a71e422617b3694161c10ba7c_416864_3c78bc66e8a0ee5c04cfc469e70687b1.png 480w ,https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/dall-e-a-steampunk-painting-of-a-data-scientist-reading-a-book-about-causal-machine-learning.png 500w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/dall-e-a-steampunk-painting-of-a-data-scientist-reading-a-book-about-causal-machine-learning.png alt="[DALL·E](https://labs.openai.com/)'s _steampunk painting of a data scientist reading a book about causal machine learning_." width=500 height=492><p><a href=https://labs.openai.com/ target=_blank rel=noopener>DALL·E</a>&rsquo;s <em>steampunk painting of a data scientist reading a book about causal machine learning</em>.</p></figure><div class=post-content><p>I was recently given a free eBook copy of the MEAP of <a href=https://www.manning.com/books/causal-machine-learning target=_blank rel=noopener><em>Causal Machine Learning</em></a>. <a href=https://www.manning.com/meap-program target=_blank rel=noopener>MEAP stands for Manning Early Access Program</a>, where books are published one chapter at a time. While the current version could use better copyediting and proofreading, I&rsquo;m keen on reading more of the book as it becomes available.</p><p><em>Causal Machine Learning</em> addresses a gap in the causal inference literature: While <a href=https://yanirseroussi.com/causal-inference-reading-list/>much has been published on the topic</a>, putting the theory to practice in the real world can be challenging. For example, even though I considered <a href=https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ target=_blank rel=noopener><em>Causal Inference: What If</em></a> to be <a href=https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/>the most practical book I&rsquo;ve read on the topic</a>, I haven&rsquo;t used much of its content directly. This is partly due to my focus on other areas, e.g., <a href=https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/>online experimentation</a> and <a href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/>the energy space</a>. But it is also due to the availability of sample code and mature packages that can be quickly adapted to my needs. The book aims to address the latter through a code-first approach that utilises Python packages such as <a href=https://pyro.ai/ target=_blank rel=noopener>Pyro</a>, <a href=https://pgmpy.org/ target=_blank rel=noopener>pgmpy</a>, and <a href=https://py-why.github.io/dowhy/ target=_blank rel=noopener>DoWhy</a>.</p><p>Despite the code-first promise, the book feels a bit slow at getting into the more exciting content. I couldn&rsquo;t help but compare it to <a href=https://github.com/fastai/fastbook target=_blank rel=noopener>the fast.ai book</a>, which first shows how to build and deploy a custom image classifier, and only then goes into unpacking how it all works. However, despite the verbosity of the first two chapters, by the third chapter things start to get more interesting. At the time of this writing, only chapters 1-3 are available, but upcoming chapters look promising based on the table of contents.</p><p>While lacking a production-ready example early in the book is a minor concern, I found the many grammatical errors more distracting. Even though a MEAP is essentially a draft, I think its proofreading level should be higher than that of a blog post.<sup id=fnref:1><a href=#fn:1 class=footnote-ref role=doc-noteref>1</a></sup> This is especially the case for paid content published by an organisation that cares enough to have contacted me to promote the book. As Steven Pinker says in the intro to <a href=https://stevenpinker.com/publications/sense-style-thinking-persons-guide-writing-21st-century target=_blank rel=noopener>The Sense of Style</a>:</p><blockquote><p>Style earns trust. If readers can see that a writer cares about consistency and accuracy in her prose, they will be reassured that the writer cares about those virtues in conduct they cannot see as easily. Here is how one technology executive explains why he rejects job applications filled with errors of grammar and punctuation: &ldquo;If it takes someone more than 20 years to notice how to properly use it&rsquo;s, then that&rsquo;s not a learning curve I&rsquo;m comfortable with.&rdquo; And if that isn&rsquo;t enough to get you to brush up your prose, consider the discovery of the dating site OkCupid that sloppy grammar and spelling in a profile are &ldquo;huge turn-offs.&rdquo; As one client said, &ldquo;If you&rsquo;re trying to date a woman, I don&rsquo;t expect flowery Jane Austen prose. But aren&rsquo;t you trying to put your best foot forward?&rdquo;</p></blockquote><p>Another source of distraction is the choice of variables for some of the toy examples. For instance, one model of blood type inheritance confuses the phenotype and genotype, claiming that <em>&ldquo;knowing your grandfather&rsquo;s [blood] type has no benefit in predicting your type once we know your father&rsquo;s&rdquo;</em>. However, <a href=https://en.wikipedia.org/wiki/ABO_blood_group_system#Genetics target=_blank rel=noopener>knowing the grandparents&rsquo; blood types <em>can</em> help predict the grandchild&rsquo;s blood type even when the parent&rsquo;s blood type is known</a>. The toy example would work if it focused on genotypes, not on the common meaning of <em>blood type</em> as the phenotype (i.e., observable traits). See <a href="https://books.google.com.au/books?id=7dzpHCHzNQ4C&amp;lpg=PA59&amp;ots=px4BFm4XAP&amp;pg=PA58#v=onepage&amp;q&amp;f=false" target=_blank rel=noopener>pages 58-60 in Probabilistic Graphical Models: Principles and Techniques</a> for a less casual presentation of a similar example.</p><figure><a href=wikipedia-blood-group-inheritance-table.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/wikipedia-blood-group-inheritance-table_huf49d4a91fcaf351951cef8de23dd1e54_204843_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/wikipedia-blood-group-inheritance-table_huf49d4a91fcaf351951cef8de23dd1e54_204843_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/wikipedia-blood-group-inheritance-table_huf49d4a91fcaf351951cef8de23dd1e54_204843_720x0_resize_box_3.png 720w,
diff --git a/2022/12/11/chatgpt-is-transformative-ai/index.html b/2022/12/11/chatgpt-is-transformative-ai/index.html
index e19e2bf17..cfc2229dc 100644
--- a/2022/12/11/chatgpt-is-transformative-ai/index.html
+++ b/2022/12/11/chatgpt-is-transformative-ai/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>ChatGPT is transformative AI | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,futurism,machine intelligence,machine learning"><meta name=description content="My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it&rsquo;s only the beginning."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="ChatGPT is transformative AI"><meta property="og:description" content="My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it&rsquo;s only the beginning."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/"><meta property="og:image" content="https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2022-12-11T00:00:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding.webp"><meta name=twitter:title content="ChatGPT is transformative AI"><meta name=twitter:description content="My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it&rsquo;s only the beginning."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"ChatGPT is transformative AI","item":"https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"ChatGPT is transformative AI","name":"ChatGPT is transformative AI","description":"My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it\u0026rsquo;s only the beginning.","keywords":["artificial intelligence","futurism","machine intelligence","machine learning"],"articleBody":"I remember the days before Google: Finding answers on the internet was tedious and clunky, I had to switch between search engines or run meta-searches to get workable results, and it still felt like I wasn’t finding all the information that was out there. Then Google came along and everything changed – I felt like I had gained new super-powers.\nUsing ChatGPT feels at least as transformative as switching from AltaVista to Google. After only a few days of working with ChatGPT, I feel like it has made me much more effective. It’d be hard to go back to pre-ChatGPT life.\nIt’s worth noting that I tend to be a mid-to-late adopter of shiny new consumer tech. I’m also a bit of an AI hype skeptic. Twitter? Seemed like a useless tool back in the day, and a mostly harmful one these days. Facebook? I resisted for a few years, and reluctantly ended up joining to avoid missing out on real-life social activity. Smartphones? Very useful, but also very distracting – I often have mine on airplane mode to avoid getting sucked in. Crypto? Still too volatile and speculative for me. Dall-E and Stable Diffusion? Fun toys, but not too useful in my everyday life.\nChatGPT is different because it distills information that is out there and makes it relevant to me. I feel like I’m still retaining agency, unlike with social media and other tools that are designed to suck me in. ChatGPT is more like a classic search engine that’s there to help when needed. I’m hooked, but not addicted.\nIn the past week, my work-related ChatGPT usage included questions about Nginx, Prefect, Python, AWS, React, MySQL, Google Sheets, and probably a few other tools. This makes it vastly more useful than GitHub Copilot, which I stopped using when it became paid. The problem with GitHub Copilot isn’t that it doesn’t provide useful output – some of its code completions feel like pure magic. The issue is more with the interface – it often distracts me from what I’m trying to do. In that sense, it’s less like a copilot and more like a backseat driver.\nChatGPT feels like a helpful copilot, personal assistant, coach, and much more – definitely worth paying for. In addition to technical advice, I asked it questions about the meaning of time, the Joel test for learning designers, rephrasing text, investment, and career-related issues. It wasn’t always correct, but it was often informative and thought-provoking. This is more than can be said for interactions with some humans.\nThe OpenAI team pretty much nailed the user experience and interface. With an ongoing chat, I can get more useful results by refining my queries. Unlike with a search engine, I don’t need to wade through sometimes-dodgy websites and discrepant interfaces to get what I’m looking for. ChatGPT makes information accessible and useful – like Google’s mission, but often better than Google (though it may catch up).\nThe exciting and terrifying thing is that the tech is still in its infancy. It’s going to get radically better and different, and disrupt many industries and people. The rise of machine intelligence continues – ChatGPT is a significant transformative AI step.\n","wordCount":"538","inLanguage":"en","image":"https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding.webp","datePublished":"2022-12-11T00:00:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">ChatGPT is transformative AI</h1><div class=post-meta><span title='2022-12-11 00:00:00 +0000 UTC'>December 11, 2022</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding_hu9fea10895dd82f471654699883db12ed_221426_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding_hu9fea10895dd82f471654699883db12ed_221426_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding_hu9fea10895dd82f471654699883db12ed_221426_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding_hu9fea10895dd82f471654699883db12ed_221426_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding_hu9fea10895dd82f471654699883db12ed_221426_1500x0_resize_q75_h2_box_2.webp 1500w ,https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding.webp 3616w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding.webp alt="[Mage](https://www.mage.space/)'s interpretation of _human brain expanding_" width=3616 height=2048><p><a href=https://www.mage.space/ target=_blank rel=noopener>Mage</a>&rsquo;s interpretation of <em>human brain expanding</em></p></figure><div class=post-content><p>I remember the days before Google: Finding answers on the internet was tedious and clunky, I had to switch between search engines or run meta-searches to get workable results, and it still felt like I wasn&rsquo;t finding all the information that was out there. Then Google came along and everything changed – I felt like I had gained new super-powers.</p><p><strong>Using <a href=https://openai.com/blog/chatgpt/ target=_blank rel=noopener>ChatGPT</a> feels at least as transformative as switching from <a href=https://en.wikipedia.org/wiki/AltaVista target=_blank rel=noopener>AltaVista</a> to Google.</strong> After only a few days of working with ChatGPT, I feel like it has made me much more effective. It&rsquo;d be hard to go back to pre-ChatGPT life.</p><p>It&rsquo;s worth noting that I tend to be a mid-to-late adopter of shiny new consumer tech. I&rsquo;m also <a href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/>a bit of an AI hype skeptic</a>. <em>Twitter?</em> Seemed like a useless tool back in the day, and a mostly harmful one these days. <em>Facebook?</em> I resisted for a few years, and reluctantly ended up joining to avoid missing out on real-life social activity. <em>Smartphones?</em> Very useful, but also very distracting – I often have mine on airplane mode to avoid getting sucked in. <em>Crypto?</em> Still too volatile and speculative for me. <em>Dall-E and Stable Diffusion?</em> Fun toys, but not too useful in my everyday life.</p><p><strong>ChatGPT is different because it distills information that is out there and makes it relevant to me.</strong> I feel like I&rsquo;m still retaining agency, unlike with social media and other tools that are designed to suck me in. ChatGPT is more like a classic search engine that&rsquo;s there to help when needed. I&rsquo;m <a href=https://www.nirandfar.com/hooked/ target=_blank rel=noopener>hooked</a>, but not addicted.</p><p>In the past week, my work-related ChatGPT usage included questions about Nginx, Prefect, Python, AWS, React, MySQL, Google Sheets, and probably a few other tools. This makes it vastly more useful than GitHub Copilot, which I stopped using when it became paid. The problem with GitHub Copilot isn&rsquo;t that it doesn&rsquo;t provide useful output – some of its code completions feel like pure magic. The issue is more with the interface – it often distracts me from what I&rsquo;m trying to do. In that sense, it&rsquo;s less like a copilot and more like a <a href=https://en.wikipedia.org/wiki/Back-seat_driver target=_blank rel=noopener>backseat driver</a>.</p><p>ChatGPT feels like a helpful copilot, personal assistant, coach, and much more – definitely worth paying for. In addition to technical advice, I asked it questions about the meaning of time, <a href=https://www.linkedin.com/posts/anushka-fowler-27524038_chatgpt-learningdesign-activity-7005677354651971584-iMvT target=_blank rel=noopener>the Joel test for learning designers</a>, rephrasing text, investment, and career-related issues. It wasn&rsquo;t always correct, but it was often informative and thought-provoking. This is more than can be said for interactions with some humans.</p><p>The OpenAI team pretty much nailed the user experience and interface. With an ongoing chat, I can get more useful results by refining my queries. Unlike with a search engine, I don&rsquo;t need to wade through sometimes-dodgy websites and discrepant interfaces to get what I&rsquo;m looking for. ChatGPT makes information accessible and useful – like Google&rsquo;s mission, but often better than Google (though <a href=https://bigtechnology.substack.com/p/why-google-missed-chatgpt target=_blank rel=noopener>it may catch up</a>).</p><p>The exciting and terrifying thing is that the tech is still in its infancy. It&rsquo;s going to get radically better and <em>different</em>, and disrupt many industries and people. <a href=https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/>The rise of machine intelligence continues</a> – ChatGPT is a significant <a href=https://www.lesswrong.com/tag/transformative-ai target=_blank rel=noopener>transformative AI</a> step.</p><figure><a href=chat-gpt-is-transformative-ai.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="artificial intelligence,futurism,machine intelligence,machine learning"><meta name=description content="My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it&rsquo;s only the beginning."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="ChatGPT is transformative AI"><meta property="og:description" content="My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it&rsquo;s only the beginning."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/"><meta property="og:image" content="https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2022-12-11T00:00:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding.webp"><meta name=twitter:title content="ChatGPT is transformative AI"><meta name=twitter:description content="My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it&rsquo;s only the beginning."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"ChatGPT is transformative AI","item":"https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"ChatGPT is transformative AI","name":"ChatGPT is transformative AI","description":"My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it\u0026rsquo;s only the beginning.","keywords":["artificial intelligence","futurism","machine intelligence","machine learning"],"articleBody":"I remember the days before Google: Finding answers on the internet was tedious and clunky, I had to switch between search engines or run meta-searches to get workable results, and it still felt like I wasn’t finding all the information that was out there. Then Google came along and everything changed – I felt like I had gained new super-powers.\nUsing ChatGPT feels at least as transformative as switching from AltaVista to Google. After only a few days of working with ChatGPT, I feel like it has made me much more effective. It’d be hard to go back to pre-ChatGPT life.\nIt’s worth noting that I tend to be a mid-to-late adopter of shiny new consumer tech. I’m also a bit of an AI hype skeptic. Twitter? Seemed like a useless tool back in the day, and a mostly harmful one these days. Facebook? I resisted for a few years, and reluctantly ended up joining to avoid missing out on real-life social activity. Smartphones? Very useful, but also very distracting – I often have mine on airplane mode to avoid getting sucked in. Crypto? Still too volatile and speculative for me. Dall-E and Stable Diffusion? Fun toys, but not too useful in my everyday life.\nChatGPT is different because it distills information that is out there and makes it relevant to me. I feel like I’m still retaining agency, unlike with social media and other tools that are designed to suck me in. ChatGPT is more like a classic search engine that’s there to help when needed. I’m hooked, but not addicted.\nIn the past week, my work-related ChatGPT usage included questions about Nginx, Prefect, Python, AWS, React, MySQL, Google Sheets, and probably a few other tools. This makes it vastly more useful than GitHub Copilot, which I stopped using when it became paid. The problem with GitHub Copilot isn’t that it doesn’t provide useful output – some of its code completions feel like pure magic. The issue is more with the interface – it often distracts me from what I’m trying to do. In that sense, it’s less like a copilot and more like a backseat driver.\nChatGPT feels like a helpful copilot, personal assistant, coach, and much more – definitely worth paying for. In addition to technical advice, I asked it questions about the meaning of time, the Joel test for learning designers, rephrasing text, investment, and career-related issues. It wasn’t always correct, but it was often informative and thought-provoking. This is more than can be said for interactions with some humans.\nThe OpenAI team pretty much nailed the user experience and interface. With an ongoing chat, I can get more useful results by refining my queries. Unlike with a search engine, I don’t need to wade through sometimes-dodgy websites and discrepant interfaces to get what I’m looking for. ChatGPT makes information accessible and useful – like Google’s mission, but often better than Google (though it may catch up).\nThe exciting and terrifying thing is that the tech is still in its infancy. It’s going to get radically better and different, and disrupt many industries and people. The rise of machine intelligence continues – ChatGPT is a significant transformative AI step.\n","wordCount":"538","inLanguage":"en","image":"https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding.webp","datePublished":"2022-12-11T00:00:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">ChatGPT is transformative AI</h1><div class=post-meta><span title='2022-12-11 00:00:00 +0000 UTC'>December 11, 2022</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding_hu9fea10895dd82f471654699883db12ed_221426_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding_hu9fea10895dd82f471654699883db12ed_221426_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding_hu9fea10895dd82f471654699883db12ed_221426_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding_hu9fea10895dd82f471654699883db12ed_221426_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding_hu9fea10895dd82f471654699883db12ed_221426_1500x0_resize_q75_h2_box_2.webp 1500w ,https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding.webp 3616w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/mage-space-prompt-human-brain-expanding.webp alt="[Mage](https://www.mage.space/)'s interpretation of _human brain expanding_" width=3616 height=2048><p><a href=https://www.mage.space/ target=_blank rel=noopener>Mage</a>&rsquo;s interpretation of <em>human brain expanding</em></p></figure><div class=post-content><p>I remember the days before Google: Finding answers on the internet was tedious and clunky, I had to switch between search engines or run meta-searches to get workable results, and it still felt like I wasn&rsquo;t finding all the information that was out there. Then Google came along and everything changed – I felt like I had gained new super-powers.</p><p><strong>Using <a href=https://openai.com/blog/chatgpt/ target=_blank rel=noopener>ChatGPT</a> feels at least as transformative as switching from <a href=https://en.wikipedia.org/wiki/AltaVista target=_blank rel=noopener>AltaVista</a> to Google.</strong> After only a few days of working with ChatGPT, I feel like it has made me much more effective. It&rsquo;d be hard to go back to pre-ChatGPT life.</p><p>It&rsquo;s worth noting that I tend to be a mid-to-late adopter of shiny new consumer tech. I&rsquo;m also <a href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/>a bit of an AI hype skeptic</a>. <em>Twitter?</em> Seemed like a useless tool back in the day, and a mostly harmful one these days. <em>Facebook?</em> I resisted for a few years, and reluctantly ended up joining to avoid missing out on real-life social activity. <em>Smartphones?</em> Very useful, but also very distracting – I often have mine on airplane mode to avoid getting sucked in. <em>Crypto?</em> Still too volatile and speculative for me. <em>Dall-E and Stable Diffusion?</em> Fun toys, but not too useful in my everyday life.</p><p><strong>ChatGPT is different because it distills information that is out there and makes it relevant to me.</strong> I feel like I&rsquo;m still retaining agency, unlike with social media and other tools that are designed to suck me in. ChatGPT is more like a classic search engine that&rsquo;s there to help when needed. I&rsquo;m <a href=https://www.nirandfar.com/hooked/ target=_blank rel=noopener>hooked</a>, but not addicted.</p><p>In the past week, my work-related ChatGPT usage included questions about Nginx, Prefect, Python, AWS, React, MySQL, Google Sheets, and probably a few other tools. This makes it vastly more useful than GitHub Copilot, which I stopped using when it became paid. The problem with GitHub Copilot isn&rsquo;t that it doesn&rsquo;t provide useful output – some of its code completions feel like pure magic. The issue is more with the interface – it often distracts me from what I&rsquo;m trying to do. In that sense, it&rsquo;s less like a copilot and more like a <a href=https://en.wikipedia.org/wiki/Back-seat_driver target=_blank rel=noopener>backseat driver</a>.</p><p>ChatGPT feels like a helpful copilot, personal assistant, coach, and much more – definitely worth paying for. In addition to technical advice, I asked it questions about the meaning of time, <a href=https://www.linkedin.com/posts/anushka-fowler-27524038_chatgpt-learningdesign-activity-7005677354651971584-iMvT target=_blank rel=noopener>the Joel test for learning designers</a>, rephrasing text, investment, and career-related issues. It wasn&rsquo;t always correct, but it was often informative and thought-provoking. This is more than can be said for interactions with some humans.</p><p>The OpenAI team pretty much nailed the user experience and interface. With an ongoing chat, I can get more useful results by refining my queries. Unlike with a search engine, I don&rsquo;t need to wade through sometimes-dodgy websites and discrepant interfaces to get what I&rsquo;m looking for. ChatGPT makes information accessible and useful – like Google&rsquo;s mission, but often better than Google (though <a href=https://bigtechnology.substack.com/p/why-google-missed-chatgpt target=_blank rel=noopener>it may catch up</a>).</p><p>The exciting and terrifying thing is that the tech is still in its infancy. It&rsquo;s going to get radically better and <em>different</em>, and disrupt many industries and people. <a href=https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/>The rise of machine intelligence continues</a> – ChatGPT is a significant <a href=https://www.lesswrong.com/tag/transformative-ai target=_blank rel=noopener>transformative AI</a> step.</p><figure><a href=chat-gpt-is-transformative-ai.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/chat-gpt-is-transformative-ai_hu2a6554fee21e5adc8b4e5b4193592a03_183217_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/chat-gpt-is-transformative-ai_hu2a6554fee21e5adc8b4e5b4193592a03_183217_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/chat-gpt-is-transformative-ai_hu2a6554fee21e5adc8b4e5b4193592a03_183217_720x0_resize_box_3.png 720w,
diff --git a/2023/04/21/remaining-relevant-as-a-small-language-model/index.html b/2023/04/21/remaining-relevant-as-a-small-language-model/index.html
index c877d6b4f..7ceb96c2d 100644
--- a/2023/04/21/remaining-relevant-as-a-small-language-model/index.html
+++ b/2023/04/21/remaining-relevant-as-a-small-language-model/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Remaining relevant as a small language model | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,career,futurism,machine intelligence"><meta name=description content="Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now)."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Remaining relevant as a small language model"><meta property="og:description" content="Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now)."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/"><meta property="og:image" content="https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2023-04-21T00:06:30+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic.webp"><meta name=twitter:title content="Remaining relevant as a small language model"><meta name=twitter:description content="Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now)."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Remaining relevant as a small language model","item":"https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Remaining relevant as a small language model","name":"Remaining relevant as a small language model","description":"Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now).","keywords":["artificial intelligence","career","futurism","machine intelligence"],"articleBody":" Computer science as a field is in for a pretty major upheaval few of us are really prepared for. Programming will be obsolete.\nMatt Welsh The End of Programming (2023) Many of us feel both despair and awe when contemplating recent AI developments: Despair due to the rapid pace of automation that threatens personal and social stability. Awe due to the seemingly-magical ability of computers to outperform most humans on an ever-expanding range of tasks. But there is nothing magical about human intelligence, just as there is no magic formula that makes horses gallop and birds fly. That is, it can all be replicated with the right machinery.\nIn its wild early days, Bing Chat referred to a user as “a late version of a small language model”. While there’s more to humans than language, there’s no denying that our language processing abilities are limited by our biology. Meanwhile, computers don’t face the same constraints. This raises the question: As small language models, what can we do that is still of value?\nBing Chat actually made some good points on humans Given how rapidly things are evolving, it’s hard for me to say anything definitive, but here are some things I believe to be true:\nDespite the hype and inevitability of bullshit applications, the current wave of AI innovation has concrete everyday uses that are already transforming our world – it’s not another crypto. This is reflected by the excitement of normally level-headed people who have seen tech trends come and go, such as Bill Gates, Matt Welsh, and Steve Yegge (among many others). Human society has a seemingly-insatiable appetite for inventing bullshit jobs. If one could wave a magic wand and reorganise the world, we could all work less and have more. Given that such a wand does not exist, I wouldn’t bet on AI displacing human labour in an orderly or reasonable manner. At best, it’s going to be messy. Current-generation AI models have limited real-world understanding. They don’t have the curiosity, rigour, truthfulness, and real-world grounding that some humans have, i.e., these models don’t exhibit a deep capacity for critical thinking. Some humans who work in language-driven domains exhibit a low capacity for critical thinking (e.g., some programmers). Such humans are prime targets for displacement by AI. Therefore, for us small language models to remain economically relevant in a world where large language models are becoming more pervasive, we need to keep developing our critical thinking skills. We definitely can’t beat computers on breadth of knowledge, cost, reliability, or work capacity. In my experience, deep critical thinking is also what distinguishes mediocre from excellent employees.\nIn the past, many organisations had to choose between employing mediocre workers and simply not getting some tasks done. Now, a new option is evolving: Hand over such tasks to AI agents. And this isn’t a hypothetical scenario. Personally, given the choice of reviewing flawed code produced by an AI or the same code produced by a human, I much prefer the former. This is mostly because AIs like ChatGPT respond better to feedback. Similarly, Simon Willison recently observed that working with ChatGPT Code Interpreter is like having a free intern that responds incredibly well to feedback. He also noted that AI-enhanced development makes him more ambitious with his projects.\nThere’s an often-repeated claim that “you won’t be replaced by AI, you’ll be replaced by a person using AI”. I’m not too sure about that – horses were almost fully replaced by motor vehicles, for example. That claim is likely true for now, though – mastering new tools is an important skill, which is where human curiosity and rigour come in. But in the long term, I’d much rather see a world where humans become as economically irrelevant as horses. I’d rather we all flourish and have more time for play – let the machines do what we today call work.\nWhat might happen once you’ve finally mastered the latest AI tools. Source: Reddit ","wordCount":"662","inLanguage":"en","image":"https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic.webp","datePublished":"2023-04-21T00:06:30Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Remaining relevant as a small language model</h1><div class=post-meta><span title='2023-04-21 00:06:30 +0000 UTC'>April 21, 2023</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic_huea5d140e2f65b46659ed7ae4a95a035f_308502_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic_huea5d140e2f65b46659ed7ae4a95a035f_308502_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic_huea5d140e2f65b46659ed7ae4a95a035f_308502_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic_huea5d140e2f65b46659ed7ae4a95a035f_308502_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic_huea5d140e2f65b46659ed7ae4a95a035f_308502_1500x0_resize_q75_h2_box_2.webp 1500w ,https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic.webp 2048w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic.webp alt="[Mage](https://www.mage.space/)'s interpretation of _horse versus car minimalistic_" width=2048 height=1024><p><a href=https://www.mage.space/ target=_blank rel=noopener>Mage</a>&rsquo;s interpretation of <em>horse versus car minimalistic</em></p></figure><div class=post-content><blockquote><p>Computer science as a field is in for a pretty major upheaval few of us are really prepared for. Programming will be obsolete.</p><footer><strong>Matt Welsh</strong>
+<meta name=keywords content="artificial intelligence,career,futurism,machine intelligence"><meta name=description content="Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now)."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Remaining relevant as a small language model"><meta property="og:description" content="Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now)."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/"><meta property="og:image" content="https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2023-04-21T00:06:30+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic.webp"><meta name=twitter:title content="Remaining relevant as a small language model"><meta name=twitter:description content="Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now)."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Remaining relevant as a small language model","item":"https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Remaining relevant as a small language model","name":"Remaining relevant as a small language model","description":"Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now).","keywords":["artificial intelligence","career","futurism","machine intelligence"],"articleBody":" Computer science as a field is in for a pretty major upheaval few of us are really prepared for. Programming will be obsolete.\nMatt Welsh The End of Programming (2023) Many of us feel both despair and awe when contemplating recent AI developments: Despair due to the rapid pace of automation that threatens personal and social stability. Awe due to the seemingly-magical ability of computers to outperform most humans on an ever-expanding range of tasks. But there is nothing magical about human intelligence, just as there is no magic formula that makes horses gallop and birds fly. That is, it can all be replicated with the right machinery.\nIn its wild early days, Bing Chat referred to a user as “a late version of a small language model”. While there’s more to humans than language, there’s no denying that our language processing abilities are limited by our biology. Meanwhile, computers don’t face the same constraints. This raises the question: As small language models, what can we do that is still of value?\nBing Chat actually made some good points on humans Given how rapidly things are evolving, it’s hard for me to say anything definitive, but here are some things I believe to be true:\nDespite the hype and inevitability of bullshit applications, the current wave of AI innovation has concrete everyday uses that are already transforming our world – it’s not another crypto. This is reflected by the excitement of normally level-headed people who have seen tech trends come and go, such as Bill Gates, Matt Welsh, and Steve Yegge (among many others). Human society has a seemingly-insatiable appetite for inventing bullshit jobs. If one could wave a magic wand and reorganise the world, we could all work less and have more. Given that such a wand does not exist, I wouldn’t bet on AI displacing human labour in an orderly or reasonable manner. At best, it’s going to be messy. Current-generation AI models have limited real-world understanding. They don’t have the curiosity, rigour, truthfulness, and real-world grounding that some humans have, i.e., these models don’t exhibit a deep capacity for critical thinking. Some humans who work in language-driven domains exhibit a low capacity for critical thinking (e.g., some programmers). Such humans are prime targets for displacement by AI. Therefore, for us small language models to remain economically relevant in a world where large language models are becoming more pervasive, we need to keep developing our critical thinking skills. We definitely can’t beat computers on breadth of knowledge, cost, reliability, or work capacity. In my experience, deep critical thinking is also what distinguishes mediocre from excellent employees.\nIn the past, many organisations had to choose between employing mediocre workers and simply not getting some tasks done. Now, a new option is evolving: Hand over such tasks to AI agents. And this isn’t a hypothetical scenario. Personally, given the choice of reviewing flawed code produced by an AI or the same code produced by a human, I much prefer the former. This is mostly because AIs like ChatGPT respond better to feedback. Similarly, Simon Willison recently observed that working with ChatGPT Code Interpreter is like having a free intern that responds incredibly well to feedback. He also noted that AI-enhanced development makes him more ambitious with his projects.\nThere’s an often-repeated claim that “you won’t be replaced by AI, you’ll be replaced by a person using AI”. I’m not too sure about that – horses were almost fully replaced by motor vehicles, for example. That claim is likely true for now, though – mastering new tools is an important skill, which is where human curiosity and rigour come in. But in the long term, I’d much rather see a world where humans become as economically irrelevant as horses. I’d rather we all flourish and have more time for play – let the machines do what we today call work.\nWhat might happen once you’ve finally mastered the latest AI tools. Source: Reddit ","wordCount":"662","inLanguage":"en","image":"https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic.webp","datePublished":"2023-04-21T00:06:30Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Remaining relevant as a small language model</h1><div class=post-meta><span title='2023-04-21 00:06:30 +0000 UTC'>April 21, 2023</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic_huea5d140e2f65b46659ed7ae4a95a035f_308502_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic_huea5d140e2f65b46659ed7ae4a95a035f_308502_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic_huea5d140e2f65b46659ed7ae4a95a035f_308502_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic_huea5d140e2f65b46659ed7ae4a95a035f_308502_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic_huea5d140e2f65b46659ed7ae4a95a035f_308502_1500x0_resize_q75_h2_box_2.webp 1500w ,https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic.webp 2048w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/mage-horse-versus-car-minimalistic.webp alt="[Mage](https://www.mage.space/)'s interpretation of _horse versus car minimalistic_" width=2048 height=1024><p><a href=https://www.mage.space/ target=_blank rel=noopener>Mage</a>&rsquo;s interpretation of <em>horse versus car minimalistic</em></p></figure><div class=post-content><blockquote><p>Computer science as a field is in for a pretty major upheaval few of us are really prepared for. Programming will be obsolete.</p><footer><strong>Matt Welsh</strong>
 <cite><a href=https://cacm.acm.org/magazines/2023/1/267976-the-end-of-programming/fulltext title=https://cacm.acm.org/magazines/2023/1/267976-the-end-of-programming/fulltext target=_blank rel=noopener>The End of Programming (2023)</a></cite></footer></blockquote><p>Many of us feel both despair and awe when contemplating recent AI developments: <em>Despair</em> due to the rapid pace of automation that threatens personal and social stability. <em>Awe</em> due to the seemingly-magical ability of computers to outperform most humans on an ever-expanding range of tasks. But there is nothing magical about human intelligence, just as there is no magic formula that makes horses gallop and birds fly. That is, it can all be replicated with the right machinery.</p><p>In its wild early days, Bing Chat referred to a user as <em>&ldquo;a late version of a small language model&rdquo;.</em> While there&rsquo;s more to humans than language, there&rsquo;s no denying that our language processing abilities are limited by our biology. Meanwhile, <a href=https://www.cold-takes.com/forecasting-transformative-ai-the-biological-anchors-method-in-a-nutshell/ target=_blank rel=noopener>computers don&rsquo;t face the same constraints</a>. This raises the question: <em>As small language models, what can we do that is still of value?</em></p><figure><a href=bing-chat-small-language-model.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/bing-chat-small-language-model_huafde71da5be9f6a07d947e542269a2bf_696853_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/bing-chat-small-language-model_huafde71da5be9f6a07d947e542269a2bf_696853_480x0_resize_box_3.png 480w,
diff --git a/2023/05/26/how-hackable-are-automated-coding-assessments/index.html b/2023/05/26/how-hackable-are-automated-coding-assessments/index.html
index 497974e35..3471a055b 100644
--- a/2023/05/26/how-hackable-are-automated-coding-assessments/index.html
+++ b/2023/05/26/how-hackable-are-automated-coding-assessments/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>How hackable are automated coding assessments? | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,career,hackers,software engineering"><meta name=description content="Exploring the hackability of speed-based coding tests, using CodeSignal&rsquo;s Industry Coding Framework as a case study."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="How hackable are automated coding assessments?"><meta property="og:description" content="Exploring the hackability of speed-based coding tests, using CodeSignal&rsquo;s Industry Coding Framework as a case study."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/"><meta property="og:image" content="https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/otter-coding-furiously.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2023-05-26T00:03:00+00:00"><meta property="article:modified_time" content="2024-06-19T17:03:21+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/otter-coding-furiously.jpg"><meta name=twitter:title content="How hackable are automated coding assessments?"><meta name=twitter:description content="Exploring the hackability of speed-based coding tests, using CodeSignal&rsquo;s Industry Coding Framework as a case study."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"How hackable are automated coding assessments?","item":"https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"How hackable are automated coding assessments?","name":"How hackable are automated coding assessments?","description":"Exploring the hackability of speed-based coding tests, using CodeSignal\u0026rsquo;s Industry Coding Framework as a case study.","keywords":["artificial intelligence","career","hackers","software engineering"],"articleBody":" Update (2024-06-19): If you're here for tips on CodeSignal's Industry Coding Framework, one of the best things you can read is their whitepaper on the topic. Scroll all the way down for a sample task. If you're searching for a remote job, you might find my list of established remote companies useful. You might also like Remote Rocketship for tens of thousands of job ads. If you're here for a long rant about the silliness of hackable coding assessments and the hoops you need to jump through as a candidate, keep reading. Also feel free to contact me if you have a fun (or not so fun) story to share. In the essay The Lesson to Unlearn, Paul Graham makes the claim that students are trained to win by hacking bad tests. That is, to get good grades, one has to avoid spending too much time on material that won’t be turned into test questions. Instead, one’s focus has to be on test-specific study. Students are taught that actual learning is less important than maximising grades. That is the lesson to unlearn.1\nEven though the essay is a few years old, it’s been on my mind recently for two reasons. The first reason is that large language models are excelling in standardised tests: I’m impressed by this progress, but it’s also a reminder of the hackability of such tests and the need to employ critical thinking to stay ahead of the AI automation wave. The second reason is that I did a CodeSignal test myself, which led me to think more deeply on the hackability of automated and timed coding assessments. This post discusses my thoughts on the topic, using CodeSignal’s Industry Coding Framework as a case study. However, most of my observations should apply to similar tests.\nWhat are hackable tests? Hacking a test is different from cheating. Hacking entails following the test’s rules, but optimising your work to exploit its weaknesses and increase your score. It doesn’t necessarily entail changing the underlying properties that the test purports to measure. By contrast, cheating entails behaviours that are prohibited by the test’s rules, such as letting someone else do the test for you, or consulting resources that are defined as off limits.\nA test’s hackability isn’t a binary property. Hackability lies on a scale from unhackable to fully hackable, as demonstrated by the following examples and plot.\nSay we take an adult and measure their height every day around the same time, over a period of a month. We can expect the measurements to have low variance. There’s little the test taker can do to significantly increase their height without cheating. The test is a good representation of the property it aims to measure – an unhackable test.\nOn the other end of the scale, say we take the same person and ask them the same set of questions over the course of a month. Our aim is to assess their skills in a subject area such as programming. Given that we’re repeating the same questions, they can find the answers and try to memorise them after each attempt. Assuming they’re sufficiently motivated, we can expect their scores to increase even if they know nothing about programming. This test is highly hackable. It’s hard to say that it accurately reflects the property it purports to measure, i.e., programming skills. This is because scores are strongly influenced by motivation to succeed in the test, as well as short-term memorisation and retrieval abilities.\nAn improvement over the unchanged test is generating variations from a set of possible questions.2 While our test taker would benefit from deeper skills in the subject area, they can also improve their scores by learning to recognise patterns in test questions, managing their time well, and memorising recurring elements. Again, we can expect their scores to improve over time and fail to accurately reflect the skills we care about. This gets us into the familiar territory of standardised testing, a category that I believe CodeSignal’s Industry Coding Assessments fall under. That is, tests that are not fully hackable, but still fall short of reflecting the properties they claim to measure.\nVisualising hackable test scores as a function of time: f(t) = b + h * sqrt(t) + N(0, σ2). Starting from the same baseline b, scores increase with time t due to the hackability factor h that is multiplied by sqrt(t) (ability to improve decays with time). Each test attempt is affected by measurement noise, which comes from a normal distribution with mean zero and variance σ2. I assume that variance and hackability are positively correlated. While this function is made up and missing an upper bound, the shape of the curves should be about right. See notebook for source code. Confessions of a test hacker Before diving into the hackability of CodeSignal’s Industry Coding Framework, here’s a bit of background on my history as a test hacker.\nBack in the day, I got pretty good at hacking tests. I enjoyed learning, but I also enjoyed getting high grades. This goes back to primary and high school and to my undergraduate degree in computer science – I graduated summa cum laude from a well-regarded university. My undergraduate days included hacks such as spending nearly all my waking hours solving past test questions during exam periods, as well as avoiding electives that had a reputation for being excessively time-consuming.\nSimilar test hacking skills were useful when interviewing with big tech companies. Early in my career, I worked with Intel, Qualcomm, and Google, and successfully interviewed with a few other tech companies. On a conceptual level, tech company tests weren’t that different from university tests, except that they were mostly oral (the dreaded whiteboard coding test), and could cover a wider breadth of topics. But even in 2005-2010, many questions leaked online, so I could follow the tried-and-tested hack of preparing by solving old test questions.\nWhile I can do well in standardised timed tests, I never liked them. Despite being hackable, they are stressful, and maximising one’s score requires adequate preparation that is different from learning deeply about the subject matter. Perhaps the most absurd example of this was when I had to take an IELTS exam (a standardised English test) for the second time after completing my PhD, as part of my Australian permanent residency application.3 This was four years after taking the IELTS exam for the first time (in Israel). I spent the intervening years in Australia, authored peer-reviewed papers and a thesis, and gave multiple conference talks. There’s no doubt that my English skills improved over those years, and yet my second IELTS scores were lower.\nWhy were my second IELTS scores lower? Partly because I didn’t prepare for the speaking part of the exam, so I didn’t have much to say when the examiner asked me about my favourite colours and the favourite colours of my friends (yes, for real). I ended up paying the fee to contest the result, and it got bumped up to be closer to my pre-PhD scores. Still, this serves as a salient example of a hackable test. You can improve your IELTS score by getting better at doing IELTS exams, and without any change to your underlying English skills.\nOnce I became an Australian permanent resident, I had to do a driving test to convert my Israeli licence. This was also silly, as I was legally allowed to drive in Australia while I was on a student visa. Those years of driving weren’t enough to automatically convert my licence to the Australian system, so I was subjected to the driving test. While a bit stressful, it wasn’t too bad because driving tests are a close simulation of the skill they aim to measure – driving on streets and highways. As such, they’re not too hackable, though I was careful to signal my intent to the tester in a way that’s somewhat unnatural (e.g., braking and indicating earlier than necessary to avoid getting penalised). I had no issues passing the test.\nFortunately, I managed to avoid convoluted tests in the decade or so since that second IELTS exam. For job applications, I’ve mostly had my skills assessed through custom take-home assignments and paid trial work, e.g., in my long application process with Automattic and my last position with Orkestra, which started as a short-term contract. Those evaluations were less hackable than the whiteboard engineering questions of my early career, and therefore felt like a better reflection of the skills they were assessing.\nOn the hackability of CodeSignal’s Industry Coding Framework Last week, I went through CodeSignal’s Industry Coding Assessment as part of a job application. While I agreed not to share the content of the assessment, there’s plenty I can discuss based on public information from CodeSignal’s website.\nThe whole experience felt like an unpleasant throwback to my old test hacking days in the noughties, but with a shinier user interface. While I’m rusty at standardised code tests, I did what any good test hacker would do: I started my preparation by searching for “Industry Coding Framework” on the web and on Blind, and reading through CodeSignal’s blog and resources. My initial search didn’t yield any unusual hacks, so I followed CodeSignal’s advice and did some of their practice questions. These turned out to be similar to the sort of questions I solved on whiteboards back in the day, except that these days, solutions are automatically scored in a web-based IDE.\nGetting familiar with CodeSignal’s environment and refreshing my speed-solving abilities was definitely helpful when I took the real assessment, and that is a prime indicator of hackability. CodeSignal states that their Industry Coding Framework is designed to evaluate the programming skills of mid-to-senior engineers. These are skills that accrue over years and decades, much like English language skills. The ideal test for such skills shouldn’t be hackable, i.e., scores should be unaffected by repetition of similar tests over a short period. However, on the morning of the test I discovered that CodeSignal’s Industry Coding Assessments are hackable by design.\nWhat I discovered was in a technical brief I initially overlooked, titled Industry Coding Skills Evaluation Framework (a longer version is stored in the Internet Archive). In the brief, they give the following breakdown of questions in their Industry Coding Assessments:\nLevel Expected time in minutes 1 10-15 2 20-30 3 30-60 4 30-60 Adding up the time ranges gives us an estimate of 90-165 minutes to complete the assessment. But the time they give candidates to complete the test is… 90 minutes! In their own words:\nThe maximum allowed completion time for the assessment is 90 minutes; however, candidates are not necessarily expected to complete all tasks within this time limit. While longer assessments allow more accurate measurement of candidate skills, the willingness to complete assessments decreases dramatically for tests longer than 2 hours. Moreover, a major factor in assessing candidates’ skill levels is to see how far they can progress within the given time frame.\nIt makes sense that candidates don’t want to spend too much time on artificial tests. But a better approach would be to design a test that can be completed within the allotted time by skilled candidates who don’t engage in test hacking. Alternatively, they could allow more time and penalise candidates for going over the minimum of 90 minutes. This would make it easier to tell the difference between people who are slightly slower than the cut-off and those who are significantly slower. Implementation speed does matter in the real world, but it’s rarely measured on the order of minutes.\nAs it stands, my opinion is that making speed a key factor in test success makes it hackable because test-specific practice can lead to dramatically better results. CodeSignal’s decision to emphasise speed turns the test into a game like Speedcubing, and a game can be defined as the overcoming of unnecessary obstacles. Gamification may be in vogue, but I believe it’s better to keep it out of the job application process.\nFurther evidence for hackability comes from the fact that CodeSignal limits test attempts over varying time windows. If CodeSignal’s assessments were more like measuring one’s height or basic driving skills, this wouldn’t be needed. Further, this somewhat favours people who are in better assessment shape, e.g., because they’re applying to many jobs and are highly motivated to get them. Sadly, I found a thread on CodeSignal’s General Coding Assessment that says that the same CodeSignal results can be used by multiple companies, which means that people get locked out of opportunities for the time window that’s determined by CodeSignal. Anecdotally, while researching this post, I also discovered that many people dislike CodeSignal and have made similar observations to mine about the validity of their evaluations. Further, when it comes to General Coding Assessments, one can find many tips on test hacking (e.g., on Reddit and GitHub).\nAnother key issue is that the effective time given for the test isn’t 90 minutes. It’s typically two weeks from the time of notification, where one can’t see the test, plus 90 minutes to do the test. The two weeks can be used for extensive test hacking, depending on the test taker’s available time and motivation.\nAs both my available time and motivation were lacking, I didn’t use the full two weeks. I quickly lost interest in solving the same kind of questions I solved around 2005. I also suspected that some of the practice questions provided by CodeSignal had little relevance to the Industry Coding Framework. In addition, I read on Glassdoor and Blind that the company (a top AI lab) that asked me to take the test had ghosted some candidates after they had passed it, so I figured that maximising my test preparation time wasn’t worth it. With more than a week left before the deadline, I decided to take the test and move on.\nBeyond hackability: Other issues with CodeSignal and automated assessments To my surprise, when I clicked the Take Test button, I was given an option to do a demo test. Hiding the demo behind that button feels a bit unfair. I assume that candidates would click the button when they’re ready to take the test, not when they want to do further preparation. But I finished the demo test in 15 out of the allotted 60 minutes, so I felt good enough about it and moved on to the real thing.\nUnfortunately, I ran out of time on the real test and scored 800 / 1000. According to the distribution in the archived version of CodeSignal’s Industry Coding Framework brief, this would have put me at the top 5% of test takers. But I’m not pleased with the result. The code I wrote was horrible and followed practices I’d never follow if I wasn’t trying to optimise for speed. There were also technical issues with the platform that got in my way: The IDE refreshed multiple times and claimed that I had lost connection, and having to use their IDE rather than a notebook environment is also a bit of a pain given the strict time constraints.\nIt’s likely I could have scored higher if I had maximised my test hacking efforts. Spending another week on preparation would have probably made a difference given that the hackability of the test is similar to that of an IELTS exam: Getting from zero to a perfect score is probably impossible over a short time span, but it is possible to nudge the score up by optimising the test-taking strategy and refreshing one’s bag of tricks (the sort of tricks that you don’t have to worry about retrieving quickly from memory under normal circumstances). For example, one relevant preparation step I could have followed was to attempt the sample questions from the Industry Coding Framework brief. I could have even taken it further and used ChatGPT (or another chatbot) to generate variations on the same theme. But as noted, I didn’t feel like it was worth maximising my hacking efforts given the circumstances.\nRegardless of hackability, I believe that the test fails to capture many of the skills it purports to measure. Specifically:\nNo points are given for good design without code that passes the automated tests. This is unlike more manual testing with a human assessor, where partial credit is given for having good ideas but running out of time. Not having to write any tests encourages lazy coding. Normally I’d think through edge cases, but optimising for the test score means that the only edge cases that matter are those that get caught by automated tests. It’s easier to deal with such issues if they get caught rather than spend precious time thinking about them. In real work, you need to spend time testing your code, which often requires more thinking than implementing the core logic. While CodeSignal claims that they test refactoring skills, the test design doesn’t even offer a caricature of real refactoring. In reality, new requirements are added over the course of days, weeks, months, and years – not minutes. And you need to refactor legacy code that runs in production and was written by many people of varying levels of proficiency and time pressures. This is nothing like tweaking throwaway code that you’ve written minutes ago. Putting a high emphasis on implementation speed when aiming to test mid-to-senior developers disadvantages those who have gotten into the habit of avoiding software engineering classic mistakes such as shortchanged quality assurance and code-like-hell programming.4 As noted by Martin Fowler, ignoring internal quality increases the pace of feature delivery early in a project’s life, but slows it down in the longer term (within weeks). Setting a 90-minute time limit on a test that’s supposed to take a minimum of 90 minutes may filter out experienced engineers who have developed good habits and didn’t bother unlearning them for test hacking purposes. This is an instance of the McNamara fallacy – time is easy to measure, but deep skills and good habits aren’t. Unfortunately, CodeSignal is heavily biased towards that which is easy to measure, but rather than admitting these flaws, they make unsupported claims about the effectiveness of their measurement approach (just read the archived version of the brief for a bit of a laugh). Hacking timed tests can be at odds with habits that are needed to develop high-quality feature-rich software. Source: Martin Fowler’s Is High Quality Software Worth the Cost? Closing thoughts: Partial hackability doesn’t imply complete uselessness Hackability is a non-binary measurement. Even hackable tests can be reflective of the properties they’re supposed to measure. As CodeSignal says in their marketing materials, they offer a cost-effective approach to filtering out candidates, at least when compared to manual in-house recruitment. From a hiring perspective, cheap filters are valuable when a company is flooded with qualified candidates, even if such filters have a high false negative rate. The goal is achieved as long as the filter also decreases the false positive rate. Favouring test hackers is a small price to pay for an initial filter – even if you get candidates to optimise for the wrong metrics, this can be corrected with more thoughtful testing down the track. However, turning the application process into a series of games risks alienating some candidates, who won’t bother applying even if they can do the job well.\nAmong other factors, test scores are a function of the test taker’s skills, test design/hackability, and the test taker’s preparation for the specific test.5 I believe that take-home coding assessments and real-work simulations offer a better candidate experience and provide a better signal to companies than artificial time-limited tests like CodeSignal’s Industry Coding Assessments. This is supported by statements from CodeSignal: The brief discussed above explicitly says that “longer assessments allow more accurate measurement of candidate skills”, and they found in their 2023 survey that candidates prefer take-home coding challenges to CodeSignal assessments.6\nMy hope is that this post would help future users of automated coding assessments in general, and CodeSignal’s Industry Coding Framework in particular. Perhaps it’d also nudge CodeSignal to improve their platform. They can do better. I won’t be holding my breath, though – standardised assessments like CodeSignal and IELTS are a part of a massive industry. There’s little incentive for incumbents to change their ways, but it is possible that large language models excelling in test hacking would force their hand.\nSome comments from a Blind thread on coding assessments. Seeing it all as a somewhat-useful game is probably the way to go. Note: I reached out to CodeSignal for a comment on this post, but haven’t heard back after more than a week.\nAs with many Paul Graham essays, I find myself in agreement with some of his ideas and disagreement with others. But hackable tests are definitely a thing, e.g., see teaching to the test and Campbell’s law. ↩︎\nTaking a machine learning analogy, asking the same questions repeatedly is likely to lead to overfitting. Drawing new questions from the same distribution is akin to adding a validation set, while dealing with the sort of problems encountered outside standardised tests is indicative of the generalisation error of the test taker. ↩︎\nIt says a lot about the hackability of higher education that the Australian government requires a PhD graduate from a top Australian university to prove that their English skills haven’t deteriorated after four years in Australia. Similarly, companies that look at educational pedigree but put recent graduates through their own set of tests implicitly distrust the grades given by universities. ↩︎\nThe only time you’re likely to face ridiculous time pressures that are measured in minutes is when something breaks in production. Production issues can be minimised through investment in solid processes and quality over a project’s lifetime. That is, you go slow to go fast and avoid fire-fighting. Take-home exams and real-work simulations are more reflective of the sort of thinking that’s required from senior engineers because good ideas often manifest when you take the time to design a system and avoid jumping into code-like-hell mode. Going with the first thing that comes to mind is a habit that’s better left to chatbots. ↩︎\nPreparation is partly a function of motivation to pass the test, which is a positive indicator despite being unrelated to possessing the skills the test purports to measure. In my case, motivation to maximise the score was lacking, so the company got useful information out of my imperfect score. Why was my motivation lacking? Because the role seemed interesting enough to apply to, but not worth working too hard to get. The opportunity cost of neglecting my other endeavours in favour of test hacking seemed too high. ↩︎\nSee page 11 of the linked survey. Like other materials from CodeSignal, it’s somewhat comical. They state that “candidates view CodeSignal assessments more favorably than timed coding assessments in general (p = 0.034)”, but looking at the table, the mean score given to CodeSignal assessments is 3.41 / 5, while general timed coding assessments were given a mean score of 3.37. That is, a difference of 0.04 – it’s hard to call this practically significant, despite the p-value. Could it be that CodeSignal’s IO psychologists missed the many memos on p-value pitfalls, such as the one by the American Statistical Association? In any case, if they consider the 0.04 difference to be notable, why do they say nothing about the 0.06 difference in favour of take-home coding assignments or the 0.17 difference in favour of coding interviews? Personally, I’d also report the full distribution rather than just the means. It’s easy enough to visualise a five-point scale. ↩︎\n","wordCount":"3946","inLanguage":"en","image":"https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/otter-coding-furiously.jpg","datePublished":"2023-05-26T00:03:00Z","dateModified":"2024-06-19T17:03:21+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">How hackable are automated coding assessments?</h1><div class=post-meta><span title='2023-05-26 00:03:00 +0000 UTC'>May 26, 2023</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/otter-coding-furiously_hu6b7664f523075193f9f11d79c1c9dcfa_195397_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/otter-coding-furiously_hu6b7664f523075193f9f11d79c1c9dcfa_195397_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/otter-coding-furiously_hu6b7664f523075193f9f11d79c1c9dcfa_195397_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/otter-coding-furiously.jpg 1023w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/otter-coding-furiously.jpg alt="Bing's interpretation of _an otter coding furiously in an attempt to pass a coding test_" width=1023 height=914><p>Bing&rsquo;s interpretation of <em>an otter coding furiously in an attempt to pass a coding test</em></p></figure><div class=post-content><div style="border:solid 1px;border-radius:4px;padding:10px 10px 0;margin-bottom:20px"><b>Update (2024-06-19):</b><ul><li>If you're here for tips on CodeSignal's Industry Coding Framework, one of the best things you can read is
+<meta name=keywords content="artificial intelligence,career,hackers,software engineering"><meta name=description content="Exploring the hackability of speed-based coding tests, using CodeSignal&rsquo;s Industry Coding Framework as a case study."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="How hackable are automated coding assessments?"><meta property="og:description" content="Exploring the hackability of speed-based coding tests, using CodeSignal&rsquo;s Industry Coding Framework as a case study."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/"><meta property="og:image" content="https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/otter-coding-furiously.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2023-05-26T00:03:00+00:00"><meta property="article:modified_time" content="2024-06-19T17:03:21+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/otter-coding-furiously.jpg"><meta name=twitter:title content="How hackable are automated coding assessments?"><meta name=twitter:description content="Exploring the hackability of speed-based coding tests, using CodeSignal&rsquo;s Industry Coding Framework as a case study."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"How hackable are automated coding assessments?","item":"https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"How hackable are automated coding assessments?","name":"How hackable are automated coding assessments?","description":"Exploring the hackability of speed-based coding tests, using CodeSignal\u0026rsquo;s Industry Coding Framework as a case study.","keywords":["artificial intelligence","career","hackers","software engineering"],"articleBody":" Update (2024-06-19): If you're here for tips on CodeSignal's Industry Coding Framework, one of the best things you can read is their whitepaper on the topic. Scroll all the way down for a sample task. If you're searching for a remote job, you might find my list of established remote companies useful. You might also like Remote Rocketship for tens of thousands of job ads. If you're here for a long rant about the silliness of hackable coding assessments and the hoops you need to jump through as a candidate, keep reading. Also feel free to contact me if you have a fun (or not so fun) story to share. In the essay The Lesson to Unlearn, Paul Graham makes the claim that students are trained to win by hacking bad tests. That is, to get good grades, one has to avoid spending too much time on material that won’t be turned into test questions. Instead, one’s focus has to be on test-specific study. Students are taught that actual learning is less important than maximising grades. That is the lesson to unlearn.1\nEven though the essay is a few years old, it’s been on my mind recently for two reasons. The first reason is that large language models are excelling in standardised tests: I’m impressed by this progress, but it’s also a reminder of the hackability of such tests and the need to employ critical thinking to stay ahead of the AI automation wave. The second reason is that I did a CodeSignal test myself, which led me to think more deeply on the hackability of automated and timed coding assessments. This post discusses my thoughts on the topic, using CodeSignal’s Industry Coding Framework as a case study. However, most of my observations should apply to similar tests.\nWhat are hackable tests? Hacking a test is different from cheating. Hacking entails following the test’s rules, but optimising your work to exploit its weaknesses and increase your score. It doesn’t necessarily entail changing the underlying properties that the test purports to measure. By contrast, cheating entails behaviours that are prohibited by the test’s rules, such as letting someone else do the test for you, or consulting resources that are defined as off limits.\nA test’s hackability isn’t a binary property. Hackability lies on a scale from unhackable to fully hackable, as demonstrated by the following examples and plot.\nSay we take an adult and measure their height every day around the same time, over a period of a month. We can expect the measurements to have low variance. There’s little the test taker can do to significantly increase their height without cheating. The test is a good representation of the property it aims to measure – an unhackable test.\nOn the other end of the scale, say we take the same person and ask them the same set of questions over the course of a month. Our aim is to assess their skills in a subject area such as programming. Given that we’re repeating the same questions, they can find the answers and try to memorise them after each attempt. Assuming they’re sufficiently motivated, we can expect their scores to increase even if they know nothing about programming. This test is highly hackable. It’s hard to say that it accurately reflects the property it purports to measure, i.e., programming skills. This is because scores are strongly influenced by motivation to succeed in the test, as well as short-term memorisation and retrieval abilities.\nAn improvement over the unchanged test is generating variations from a set of possible questions.2 While our test taker would benefit from deeper skills in the subject area, they can also improve their scores by learning to recognise patterns in test questions, managing their time well, and memorising recurring elements. Again, we can expect their scores to improve over time and fail to accurately reflect the skills we care about. This gets us into the familiar territory of standardised testing, a category that I believe CodeSignal’s Industry Coding Assessments fall under. That is, tests that are not fully hackable, but still fall short of reflecting the properties they claim to measure.\nVisualising hackable test scores as a function of time: f(t) = b + h * sqrt(t) + N(0, σ2). Starting from the same baseline b, scores increase with time t due to the hackability factor h that is multiplied by sqrt(t) (ability to improve decays with time). Each test attempt is affected by measurement noise, which comes from a normal distribution with mean zero and variance σ2. I assume that variance and hackability are positively correlated. While this function is made up and missing an upper bound, the shape of the curves should be about right. See notebook for source code. Confessions of a test hacker Before diving into the hackability of CodeSignal’s Industry Coding Framework, here’s a bit of background on my history as a test hacker.\nBack in the day, I got pretty good at hacking tests. I enjoyed learning, but I also enjoyed getting high grades. This goes back to primary and high school and to my undergraduate degree in computer science – I graduated summa cum laude from a well-regarded university. My undergraduate days included hacks such as spending nearly all my waking hours solving past test questions during exam periods, as well as avoiding electives that had a reputation for being excessively time-consuming.\nSimilar test hacking skills were useful when interviewing with big tech companies. Early in my career, I worked with Intel, Qualcomm, and Google, and successfully interviewed with a few other tech companies. On a conceptual level, tech company tests weren’t that different from university tests, except that they were mostly oral (the dreaded whiteboard coding test), and could cover a wider breadth of topics. But even in 2005-2010, many questions leaked online, so I could follow the tried-and-tested hack of preparing by solving old test questions.\nWhile I can do well in standardised timed tests, I never liked them. Despite being hackable, they are stressful, and maximising one’s score requires adequate preparation that is different from learning deeply about the subject matter. Perhaps the most absurd example of this was when I had to take an IELTS exam (a standardised English test) for the second time after completing my PhD, as part of my Australian permanent residency application.3 This was four years after taking the IELTS exam for the first time (in Israel). I spent the intervening years in Australia, authored peer-reviewed papers and a thesis, and gave multiple conference talks. There’s no doubt that my English skills improved over those years, and yet my second IELTS scores were lower.\nWhy were my second IELTS scores lower? Partly because I didn’t prepare for the speaking part of the exam, so I didn’t have much to say when the examiner asked me about my favourite colours and the favourite colours of my friends (yes, for real). I ended up paying the fee to contest the result, and it got bumped up to be closer to my pre-PhD scores. Still, this serves as a salient example of a hackable test. You can improve your IELTS score by getting better at doing IELTS exams, and without any change to your underlying English skills.\nOnce I became an Australian permanent resident, I had to do a driving test to convert my Israeli licence. This was also silly, as I was legally allowed to drive in Australia while I was on a student visa. Those years of driving weren’t enough to automatically convert my licence to the Australian system, so I was subjected to the driving test. While a bit stressful, it wasn’t too bad because driving tests are a close simulation of the skill they aim to measure – driving on streets and highways. As such, they’re not too hackable, though I was careful to signal my intent to the tester in a way that’s somewhat unnatural (e.g., braking and indicating earlier than necessary to avoid getting penalised). I had no issues passing the test.\nFortunately, I managed to avoid convoluted tests in the decade or so since that second IELTS exam. For job applications, I’ve mostly had my skills assessed through custom take-home assignments and paid trial work, e.g., in my long application process with Automattic and my last position with Orkestra, which started as a short-term contract. Those evaluations were less hackable than the whiteboard engineering questions of my early career, and therefore felt like a better reflection of the skills they were assessing.\nOn the hackability of CodeSignal’s Industry Coding Framework Last week, I went through CodeSignal’s Industry Coding Assessment as part of a job application. While I agreed not to share the content of the assessment, there’s plenty I can discuss based on public information from CodeSignal’s website.\nThe whole experience felt like an unpleasant throwback to my old test hacking days in the noughties, but with a shinier user interface. While I’m rusty at standardised code tests, I did what any good test hacker would do: I started my preparation by searching for “Industry Coding Framework” on the web and on Blind, and reading through CodeSignal’s blog and resources. My initial search didn’t yield any unusual hacks, so I followed CodeSignal’s advice and did some of their practice questions. These turned out to be similar to the sort of questions I solved on whiteboards back in the day, except that these days, solutions are automatically scored in a web-based IDE.\nGetting familiar with CodeSignal’s environment and refreshing my speed-solving abilities was definitely helpful when I took the real assessment, and that is a prime indicator of hackability. CodeSignal states that their Industry Coding Framework is designed to evaluate the programming skills of mid-to-senior engineers. These are skills that accrue over years and decades, much like English language skills. The ideal test for such skills shouldn’t be hackable, i.e., scores should be unaffected by repetition of similar tests over a short period. However, on the morning of the test I discovered that CodeSignal’s Industry Coding Assessments are hackable by design.\nWhat I discovered was in a technical brief I initially overlooked, titled Industry Coding Skills Evaluation Framework (a longer version is stored in the Internet Archive). In the brief, they give the following breakdown of questions in their Industry Coding Assessments:\nLevel Expected time in minutes 1 10-15 2 20-30 3 30-60 4 30-60 Adding up the time ranges gives us an estimate of 90-165 minutes to complete the assessment. But the time they give candidates to complete the test is… 90 minutes! In their own words:\nThe maximum allowed completion time for the assessment is 90 minutes; however, candidates are not necessarily expected to complete all tasks within this time limit. While longer assessments allow more accurate measurement of candidate skills, the willingness to complete assessments decreases dramatically for tests longer than 2 hours. Moreover, a major factor in assessing candidates’ skill levels is to see how far they can progress within the given time frame.\nIt makes sense that candidates don’t want to spend too much time on artificial tests. But a better approach would be to design a test that can be completed within the allotted time by skilled candidates who don’t engage in test hacking. Alternatively, they could allow more time and penalise candidates for going over the minimum of 90 minutes. This would make it easier to tell the difference between people who are slightly slower than the cut-off and those who are significantly slower. Implementation speed does matter in the real world, but it’s rarely measured on the order of minutes.\nAs it stands, my opinion is that making speed a key factor in test success makes it hackable because test-specific practice can lead to dramatically better results. CodeSignal’s decision to emphasise speed turns the test into a game like Speedcubing, and a game can be defined as the overcoming of unnecessary obstacles. Gamification may be in vogue, but I believe it’s better to keep it out of the job application process.\nFurther evidence for hackability comes from the fact that CodeSignal limits test attempts over varying time windows. If CodeSignal’s assessments were more like measuring one’s height or basic driving skills, this wouldn’t be needed. Further, this somewhat favours people who are in better assessment shape, e.g., because they’re applying to many jobs and are highly motivated to get them. Sadly, I found a thread on CodeSignal’s General Coding Assessment that says that the same CodeSignal results can be used by multiple companies, which means that people get locked out of opportunities for the time window that’s determined by CodeSignal. Anecdotally, while researching this post, I also discovered that many people dislike CodeSignal and have made similar observations to mine about the validity of their evaluations. Further, when it comes to General Coding Assessments, one can find many tips on test hacking (e.g., on Reddit and GitHub).\nAnother key issue is that the effective time given for the test isn’t 90 minutes. It’s typically two weeks from the time of notification, where one can’t see the test, plus 90 minutes to do the test. The two weeks can be used for extensive test hacking, depending on the test taker’s available time and motivation.\nAs both my available time and motivation were lacking, I didn’t use the full two weeks. I quickly lost interest in solving the same kind of questions I solved around 2005. I also suspected that some of the practice questions provided by CodeSignal had little relevance to the Industry Coding Framework. In addition, I read on Glassdoor and Blind that the company (a top AI lab) that asked me to take the test had ghosted some candidates after they had passed it, so I figured that maximising my test preparation time wasn’t worth it. With more than a week left before the deadline, I decided to take the test and move on.\nBeyond hackability: Other issues with CodeSignal and automated assessments To my surprise, when I clicked the Take Test button, I was given an option to do a demo test. Hiding the demo behind that button feels a bit unfair. I assume that candidates would click the button when they’re ready to take the test, not when they want to do further preparation. But I finished the demo test in 15 out of the allotted 60 minutes, so I felt good enough about it and moved on to the real thing.\nUnfortunately, I ran out of time on the real test and scored 800 / 1000. According to the distribution in the archived version of CodeSignal’s Industry Coding Framework brief, this would have put me at the top 5% of test takers. But I’m not pleased with the result. The code I wrote was horrible and followed practices I’d never follow if I wasn’t trying to optimise for speed. There were also technical issues with the platform that got in my way: The IDE refreshed multiple times and claimed that I had lost connection, and having to use their IDE rather than a notebook environment is also a bit of a pain given the strict time constraints.\nIt’s likely I could have scored higher if I had maximised my test hacking efforts. Spending another week on preparation would have probably made a difference given that the hackability of the test is similar to that of an IELTS exam: Getting from zero to a perfect score is probably impossible over a short time span, but it is possible to nudge the score up by optimising the test-taking strategy and refreshing one’s bag of tricks (the sort of tricks that you don’t have to worry about retrieving quickly from memory under normal circumstances). For example, one relevant preparation step I could have followed was to attempt the sample questions from the Industry Coding Framework brief. I could have even taken it further and used ChatGPT (or another chatbot) to generate variations on the same theme. But as noted, I didn’t feel like it was worth maximising my hacking efforts given the circumstances.\nRegardless of hackability, I believe that the test fails to capture many of the skills it purports to measure. Specifically:\nNo points are given for good design without code that passes the automated tests. This is unlike more manual testing with a human assessor, where partial credit is given for having good ideas but running out of time. Not having to write any tests encourages lazy coding. Normally I’d think through edge cases, but optimising for the test score means that the only edge cases that matter are those that get caught by automated tests. It’s easier to deal with such issues if they get caught rather than spend precious time thinking about them. In real work, you need to spend time testing your code, which often requires more thinking than implementing the core logic. While CodeSignal claims that they test refactoring skills, the test design doesn’t even offer a caricature of real refactoring. In reality, new requirements are added over the course of days, weeks, months, and years – not minutes. And you need to refactor legacy code that runs in production and was written by many people of varying levels of proficiency and time pressures. This is nothing like tweaking throwaway code that you’ve written minutes ago. Putting a high emphasis on implementation speed when aiming to test mid-to-senior developers disadvantages those who have gotten into the habit of avoiding software engineering classic mistakes such as shortchanged quality assurance and code-like-hell programming.4 As noted by Martin Fowler, ignoring internal quality increases the pace of feature delivery early in a project’s life, but slows it down in the longer term (within weeks). Setting a 90-minute time limit on a test that’s supposed to take a minimum of 90 minutes may filter out experienced engineers who have developed good habits and didn’t bother unlearning them for test hacking purposes. This is an instance of the McNamara fallacy – time is easy to measure, but deep skills and good habits aren’t. Unfortunately, CodeSignal is heavily biased towards that which is easy to measure, but rather than admitting these flaws, they make unsupported claims about the effectiveness of their measurement approach (just read the archived version of the brief for a bit of a laugh). Hacking timed tests can be at odds with habits that are needed to develop high-quality feature-rich software. Source: Martin Fowler’s Is High Quality Software Worth the Cost? Closing thoughts: Partial hackability doesn’t imply complete uselessness Hackability is a non-binary measurement. Even hackable tests can be reflective of the properties they’re supposed to measure. As CodeSignal says in their marketing materials, they offer a cost-effective approach to filtering out candidates, at least when compared to manual in-house recruitment. From a hiring perspective, cheap filters are valuable when a company is flooded with qualified candidates, even if such filters have a high false negative rate. The goal is achieved as long as the filter also decreases the false positive rate. Favouring test hackers is a small price to pay for an initial filter – even if you get candidates to optimise for the wrong metrics, this can be corrected with more thoughtful testing down the track. However, turning the application process into a series of games risks alienating some candidates, who won’t bother applying even if they can do the job well.\nAmong other factors, test scores are a function of the test taker’s skills, test design/hackability, and the test taker’s preparation for the specific test.5 I believe that take-home coding assessments and real-work simulations offer a better candidate experience and provide a better signal to companies than artificial time-limited tests like CodeSignal’s Industry Coding Assessments. This is supported by statements from CodeSignal: The brief discussed above explicitly says that “longer assessments allow more accurate measurement of candidate skills”, and they found in their 2023 survey that candidates prefer take-home coding challenges to CodeSignal assessments.6\nMy hope is that this post would help future users of automated coding assessments in general, and CodeSignal’s Industry Coding Framework in particular. Perhaps it’d also nudge CodeSignal to improve their platform. They can do better. I won’t be holding my breath, though – standardised assessments like CodeSignal and IELTS are a part of a massive industry. There’s little incentive for incumbents to change their ways, but it is possible that large language models excelling in test hacking would force their hand.\nSome comments from a Blind thread on coding assessments. Seeing it all as a somewhat-useful game is probably the way to go. Note: I reached out to CodeSignal for a comment on this post, but haven’t heard back after more than a week.\nAs with many Paul Graham essays, I find myself in agreement with some of his ideas and disagreement with others. But hackable tests are definitely a thing, e.g., see teaching to the test and Campbell’s law. ↩︎\nTaking a machine learning analogy, asking the same questions repeatedly is likely to lead to overfitting. Drawing new questions from the same distribution is akin to adding a validation set, while dealing with the sort of problems encountered outside standardised tests is indicative of the generalisation error of the test taker. ↩︎\nIt says a lot about the hackability of higher education that the Australian government requires a PhD graduate from a top Australian university to prove that their English skills haven’t deteriorated after four years in Australia. Similarly, companies that look at educational pedigree but put recent graduates through their own set of tests implicitly distrust the grades given by universities. ↩︎\nThe only time you’re likely to face ridiculous time pressures that are measured in minutes is when something breaks in production. Production issues can be minimised through investment in solid processes and quality over a project’s lifetime. That is, you go slow to go fast and avoid fire-fighting. Take-home exams and real-work simulations are more reflective of the sort of thinking that’s required from senior engineers because good ideas often manifest when you take the time to design a system and avoid jumping into code-like-hell mode. Going with the first thing that comes to mind is a habit that’s better left to chatbots. ↩︎\nPreparation is partly a function of motivation to pass the test, which is a positive indicator despite being unrelated to possessing the skills the test purports to measure. In my case, motivation to maximise the score was lacking, so the company got useful information out of my imperfect score. Why was my motivation lacking? Because the role seemed interesting enough to apply to, but not worth working too hard to get. The opportunity cost of neglecting my other endeavours in favour of test hacking seemed too high. ↩︎\nSee page 11 of the linked survey. Like other materials from CodeSignal, it’s somewhat comical. They state that “candidates view CodeSignal assessments more favorably than timed coding assessments in general (p = 0.034)”, but looking at the table, the mean score given to CodeSignal assessments is 3.41 / 5, while general timed coding assessments were given a mean score of 3.37. That is, a difference of 0.04 – it’s hard to call this practically significant, despite the p-value. Could it be that CodeSignal’s IO psychologists missed the many memos on p-value pitfalls, such as the one by the American Statistical Association? In any case, if they consider the 0.04 difference to be notable, why do they say nothing about the 0.06 difference in favour of take-home coding assignments or the 0.17 difference in favour of coding interviews? Personally, I’d also report the full distribution rather than just the means. It’s easy enough to visualise a five-point scale. ↩︎\n","wordCount":"3946","inLanguage":"en","image":"https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/otter-coding-furiously.jpg","datePublished":"2023-05-26T00:03:00Z","dateModified":"2024-06-19T17:03:21+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">How hackable are automated coding assessments?</h1><div class=post-meta><span title='2023-05-26 00:03:00 +0000 UTC'>May 26, 2023</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/otter-coding-furiously_hu6b7664f523075193f9f11d79c1c9dcfa_195397_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/otter-coding-furiously_hu6b7664f523075193f9f11d79c1c9dcfa_195397_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/otter-coding-furiously_hu6b7664f523075193f9f11d79c1c9dcfa_195397_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/otter-coding-furiously.jpg 1023w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/otter-coding-furiously.jpg alt="Bing's interpretation of _an otter coding furiously in an attempt to pass a coding test_" width=1023 height=914><p>Bing&rsquo;s interpretation of <em>an otter coding furiously in an attempt to pass a coding test</em></p></figure><div class=post-content><div style="border:solid 1px;border-radius:4px;padding:10px 10px 0;margin-bottom:20px"><b>Update (2024-06-19):</b><ul><li>If you're here for tips on CodeSignal's Industry Coding Framework, one of the best things you can read is
 <a href=https://web.archive.org/web/20230321142915/https://discover.codesignal.com/rs/659-AFH-023/images/Industry-Coding-Skills-Evaluation-Framework-CodeSignal-Skills-Evaluation-Lab-Short.pdf target=_blank rel=noopener>their whitepaper on the topic</a>. Scroll all the way down for a sample task.</li><li>If you're searching for a remote job, you might find <a href=https://github.com/yanirs/established-remote/ target=_blank rel=noopener>my list of established remote companies</a> useful. You might also like <a href="https://remoterocketship.com/?ref=yanirs-hackable-coding-assessments" target=_blank rel=noopener>Remote Rocketship for tens of thousands of job ads</a>.</li><li>If you're here for a long rant about the silliness of hackable coding assessments and the hoops you need to jump through as a candidate, keep reading. Also feel free to <a href=/contact/>contact me</a> if you have a fun (or not so fun) story to share.</li></ul></div><p>In the essay <a href=http://www.paulgraham.com/lesson.html target=_blank rel=noopener>The Lesson to Unlearn</a>, Paul Graham makes the claim that students are trained to win by hacking bad tests. That is, to get good grades, one has to avoid spending too much time on material that won&rsquo;t be turned into test questions. Instead, one&rsquo;s focus has to be on test-specific study. Students are taught that actual learning is less important than maximising grades. That is the lesson to unlearn.<sup id=fnref:1><a href=#fn:1 class=footnote-ref role=doc-noteref>1</a></sup></p><p>Even though the essay is a few years old, it&rsquo;s been on my mind recently for two reasons. The first reason is that <a href=https://openai.com/product/gpt-4 target=_blank rel=noopener>large language models are excelling in standardised tests</a>: <a href=https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/>I&rsquo;m impressed by this progress</a>, but it&rsquo;s also a reminder of the hackability of such tests and <a href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/>the need to employ critical thinking to stay ahead of the AI automation wave</a>. The second reason is that I did a <a href=https://codesignal.com/ target=_blank rel=noopener>CodeSignal</a> test myself, which led me to think more deeply on the hackability of automated and timed coding assessments. This post discusses my thoughts on the topic, using CodeSignal&rsquo;s Industry Coding Framework as a case study. However, most of my observations should apply to similar tests.</p><h2 id=what-are-hackable-tests>What are hackable tests?<a hidden class=anchor aria-hidden=true href=#what-are-hackable-tests>#</a></h2><p>Hacking a test is different from cheating. Hacking entails following the test&rsquo;s rules, but optimising your work to exploit its weaknesses and increase your score. It doesn&rsquo;t necessarily entail changing the underlying properties that the test purports to measure. By contrast, cheating entails behaviours that are prohibited by the test&rsquo;s rules, such as letting someone else do the test for you, or consulting resources that are defined as off limits.</p><p>A test&rsquo;s hackability isn&rsquo;t a binary property. Hackability lies on a scale from unhackable to fully hackable, as demonstrated by the following examples and plot.</p><p>Say we take an adult and measure their height every day around the same time, over a period of a month. We can expect the measurements to have low variance. There&rsquo;s little the test taker can do to significantly increase their height without cheating. The test is a good representation of the property it aims to measure – <strong>an unhackable test.</strong></p><p>On the other end of the scale, say we take the same person and ask them the same set of questions over the course of a month. Our aim is to assess their skills in a subject area such as programming. Given that we&rsquo;re repeating the same questions, they can find the answers and try to memorise them after each attempt. Assuming they&rsquo;re sufficiently motivated, we can expect their scores to increase even if they know nothing about programming. <strong>This test is highly hackable.</strong> It&rsquo;s hard to say that it accurately reflects the property it purports to measure, i.e., programming skills. This is because scores are strongly influenced by motivation to succeed in the test, as well as short-term memorisation and retrieval abilities.</p><p>An improvement over the unchanged test is generating variations from a set of possible questions.<sup id=fnref:2><a href=#fn:2 class=footnote-ref role=doc-noteref>2</a></sup> While our test taker would benefit from deeper skills in the subject area, they can also improve their scores by learning to recognise patterns in test questions, managing their time well, and memorising recurring elements. Again, we can expect their scores to improve over time and fail to accurately reflect the skills we care about. This gets us into the familiar territory of standardised testing, a category that I believe <a href=https://codesignal.com/resource/industry-coding-data-sheet/ target=_blank rel=noopener>CodeSignal&rsquo;s Industry Coding Assessments</a> fall under. That is, tests that are not fully hackable, but still fall short of reflecting the properties they claim to measure.</p><figure><a href=test-hackability-plot.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/test-hackability-plot_huc6cd7ee0ede95da30b0428b769a2cee3_186486_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/test-hackability-plot_huc6cd7ee0ede95da30b0428b769a2cee3_186486_480x0_resize_box_3.png 480w,
diff --git a/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/index.html b/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/index.html
index 5f396dd18..b5216618c 100644
--- a/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/index.html
+++ b/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Was data science a failure mode of software engineering? | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,career,data science,software engineering"><meta name=description content="Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Was data science a failure mode of software engineering?"><meta property="og:description" content="Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/"><meta property="og:image" content="https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/data-science-software-engineering-failure.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2023-06-30T00:06:30+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/data-science-software-engineering-failure.jpg"><meta name=twitter:title content="Was data science a failure mode of software engineering?"><meta name=twitter:description content="Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Was data science a failure mode of software engineering?","item":"https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Was data science a failure mode of software engineering?","name":"Was data science a failure mode of software engineering?","description":"Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.","keywords":["artificial intelligence","career","data science","software engineering"],"articleBody":"The world was a different place in 2012. I had just finished my PhD, and I wasn’t sure what my title should be. My formal academic transcript said that I specialised in artificial intelligence, but back then it seemed silly to call myself an AI Expert. As I was interested in startups, I became the first employee of Giveable, and my title was Data Scientist. This was the year Harvard Business Review declared data scientist to be the sexiest job of the 21st century, so it suited me just fine. I got to do work I found interesting while reaping the benefits of being in an over-hyped profession.\nAs data science was a new term, I attempted to decipher its evolving meaning. In 2014, I liked the definition by Josh Wills, who saw it as the intersection of software engineering and statistics. By 2018, I came to see it as the union of many fields, with practitioners who support and drive decisions by employing descriptive analytics, predictive models, and causal inference. In 2020, I reflected on the trend of software commodities displacing interesting data science work. Now, I look back and wonder: Was data science a failure mode of software engineering? That is, did many data science projects repeat classic software engineering mistakes (especially in the early days)?\nBreaking Betteridge’s law of headlines, my answer to these questions is yes. I believe that many instances of data science projects exhibited classic software engineering mistakes, especially in the 2010s. Things appear to be getting better, though. The emergence of professions like data engineering, machine learning engineering, and analytics engineering represents a move away from getting data scientists to fail at software engineering – simply because they need to do less of it. But this isn’t the case everywhere, as data maturity varies across organisations.\nFailure mode examples For a data science project to exhibit a failure mode of software engineering, it needs to: (1) have working software as one of its outcomes; and (2) fail in a way that software engineering projects are known to fail.\nNot all data science projects meet my first criterion. Some projects end with a one-off report as their outcome, which is fine if that’s the project’s goal. However, many data science projects aim to deliver software systems that need to operate continuously and reliably. Quoting one of the principles behind the agile manifesto, for projects of the latter type, working software is the primary measure of progress. My sense is that such projects were driving the data science hype, e.g., a personalisation system that automatically increases revenue is both more exciting and more valuable than a one-off report.\nFor my second criterion, I’ll discuss some classic software engineering mistakes and how they manifest in data science projects. These come from a list compiled by Steve McConnell in 1996 and updated in 2008. While some mistakes have become less common, many are still repeated to this day. As Jeff Atwood noted in 2007, “classic mistakes are classic because they’re so seductive.” The updated list contains 42 mistakes, so I’ll highlight five I find especially pertinent: unrealistic expectations, heroics, research-oriented development, silver-bullet syndrome, and lack of automated source-code control.\n(M1) Unrealistic expectations. This mistake had the highest exposure index in McConnell’s 2008 report, meaning it’s both frequent and severe. I don’t have solid data on the occurrence of this mistake in data science projects, but unrealistic expectations go hand in hand with an over-hyped field. This is exemplified by the Gartner hype cycle, where technologies hit a peak of inflated expectations followed by a trough of disillusionment. While the general validity of the hype cycle model is questionable, I’ve experienced enough instances of unrealistic expectations and heard enough stories to believe that many data science projects have not escaped this classic mistake.\nGartner hype cycle. Source: Olga Tarkovskiy, CC BY-SA 3.0, via Wikimedia Commons. (M2) Heroics. This classic mistake is probably best exemplified by the labelling of data scientist as the sexiest job of the 21st century. Yes, it was just a Harvard Business Review article, but with almost 2,000 scholarly citations and numerous other mentions, it’s beyond doubt that it had helped paint a picture of data scientists as heroes who “understand how to fish out answers to important business questions from today’s tsunami of unstructured information”. More careful reading of the original article reveals that the authors referred to data scientist as “the hot job of the decade” (emphasis mine). Indeed, the same authors published a follow-up article in 2022 that implicitly follows Betteridge’s law of headlines: Is Data Scientist Still the Sexiest Job of the 21st Century? The 2022 article notes that “businesses now need to create and oversee diverse data science teams rather than searching for data scientist unicorns”, or in more detail:\nThe data science role is also now supplemented with a variety of other jobs. The assumption in 2012 was that data scientists could do all required tasks in a data science application — from conceptualizing the use case, to interfacing with business and technology stakeholders, to developing the algorithm and deploying it into production. Now, however, there has been a proliferation of related jobs to handle many of those tasks, including machine learning engineer, data engineer, AI specialist, analytics and AI translators, and data oriented product managers. LinkedIn reported some of these jobs as being more popular than data scientists in its “Jobs on the Rise” reports for 2021 and 2022 for the U.S.\nWhile I have my doubts about AI specialist ever becoming a well-defined profession, it seems like the days of “sexy” data science heroes are thankfully behind us.\n(M3) Research-oriented development. This classic mistake is possibly one of the top reasons so many data science projects had failed to make it to production. Leaning towards research was probably due to many early data scientists coming from academia (or having an academic fetish, as noted by one Reddit commenter). However, there’s a fine distinction to draw between research and experimentation. Research aims to expand the frontiers of knowledge, which is an expensive, high-risk activity that should be avoided by most organisations. By contrast, experimentation aims to uncover truths and opportunities within a limited area, e.g., optimising landing pages for a specific product. In many business domains, rigorous experimentation requires a robust software platform. Having worked on such a platform myself, I consider this type of experimentation to be a data science success story, even though some people see A/B testing and causal inference as less “sexy” than machine learning.\n(M4) Silver-bullet syndrome. McConnell’s description of this mistake should be familiar to anyone who’s been in tech for long enough:\nOn some projects, there is an over reliance on the advertised benefits of previously unused technologies, tools, or 3rd party applications and too little information about how well they would do in the current development environment. When project teams latch onto a single new methodology or new technology and expect it to solve their cost, schedule, or quality problems, they are inevitably disappointed.\nThe silver bullet may be data science, or it may be AI, large language models, big data, or blockchain. As long as humans are running things, it’s unlikely we’d run out of silver bullets. While many have warned that data science isn’t a silver bullet (e.g., see my early posts on data’s hierarchy of needs and avoiding premature hiring of data scientists), people still fell for it. And people still fall for the latest shiny thing. I’m not immune either, e.g., I still find ChatGPT to be transformative and believe that AI will keep changing our world. But I can’t put a timeline on global transformations, and I doubt that a single technology would solve all the world’s problems.\nWill big data ever make a resurgence as a silver bullet? (M5) Lack of automated source-code control. While McConnell’s 2008 report found this to be a low-frequency mistake, data science may have helped resuscitate it. This is due to multiple factors:\nUsing notebooks for development and experimentation. I use notebooks myself – they are popular for a reason. However, they don’t play well with source control systems without additional tooling. For example, it’s hard to collaborate on notebooks as people do on plain code – just try merging notebook changes from multiple authors for a bit of fun. Many data scientists came from fields where source control wasn’t common. This is probably decreasing now with Git being the standard source control system, but it wasn’t the case about a decade ago. Data transformations don’t always live under source control. For example, analytics flows might be buried in stored procedures or database views, or worse – copied around by analysts. Again, this is changing thanks to a growing awareness and the rise of tools like dbt, but not everyone has adopted the modern data stack yet. Related to a lack of source-code control is a lack of control over model versioning and data lineage, as it takes some experience to develop an appreciation of the need for versioning and reproducibility. Still, it’s easy to end up with a big mess even with an awareness of these issues and the best intentions. It’s hard to control everything.\nLearning from history while moving forward Data science is maturing. We’ve gone from a “sexy” field to people calling themselves “recovering data scientists”, saying goodbye to the field, and declaring that “there will be no data science job titles by 2029”. Anecdotally, it seems to me like data science is becoming less of a failure mode of software engineering – perhaps because data scientists are no longer expected to single-handedly ship complex software systems.\nPersonally, I still struggle with giving a concise title to what I do, just like in 2012. Data scientist has become a loaded term – some see data science as a cost centre that fails to deliver tangible results. As I’ve never stopped doing software engineering, I try to emphasise it by saying that I’m a full-stack data scientist and software engineer. This is a bit of a mouthful, and full-stack is also a loaded term, but it seems apt because I’ve shipped production code that ranges from old-school C on pre-Android phones through data pipelines to web applications. My main concern these days is putting my skills to good use, especially within climate tech and related areas. But as I’m freelancing, I went with Data \u0026 AI Consultant as my LinkedIn job title – maybe my PhD specialisation in AI wasn’t so silly after all…?\n","wordCount":"1758","inLanguage":"en","image":"https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/data-science-software-engineering-failure.jpg","datePublished":"2023-06-30T00:06:30Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Was data science a failure mode of software engineering?</h1><div class=post-meta><span title='2023-06-30 00:06:30 +0000 UTC'>June 30, 2023</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/data-science-software-engineering-failure_hu6b7664f523075193f9f11d79c1c9dcfa_143432_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/data-science-software-engineering-failure_hu6b7664f523075193f9f11d79c1c9dcfa_143432_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/data-science-software-engineering-failure_hu6b7664f523075193f9f11d79c1c9dcfa_143432_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/data-science-software-engineering-failure.jpg 1024w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/data-science-software-engineering-failure.jpg alt="Not sure what's going on here, but it came from an odd conversation on the topic with Bing. Seems apt." width=1024 height=678><p>Not sure what&rsquo;s going on here, but it came from an odd conversation on the topic with Bing. Seems apt.</p></figure><div class=post-content><p>The world was a different place in 2012. I had just finished <a href=https://yanirseroussi.com/phd-work/>my PhD</a>, and I wasn&rsquo;t sure what my title should be. My formal academic transcript said that I specialised in artificial intelligence, but back then it seemed silly to call myself an AI Expert. As I was interested in startups, I became the first employee of <a href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/>Giveable</a>, and my title was Data Scientist. This was the year <a href=https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century target=_blank rel=noopener>Harvard Business Review declared data scientist to be the sexiest job of the 21st century</a>, so it suited me just fine. I got to do work I found interesting while reaping the benefits of being in an over-hyped profession.</p><p>As data science was a new term, I attempted to decipher its evolving meaning. In 2014, <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>I liked the definition by Josh Wills</a>, who saw it as the intersection of software engineering and statistics. By 2018, I came to see it as <a href=https://data.blog/2018/03/20/engineering-data-science-at-automattic/ target=_blank rel=noopener>the union of many fields</a>, with <a href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/>practitioners who support and drive decisions by employing descriptive analytics, predictive models, and causal inference</a>. In 2020, I <a href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/>reflected on the trend of software commodities displacing interesting data science work</a>. Now, I look back and wonder: Was data science a failure mode of software engineering? That is, did many data science projects repeat classic software engineering mistakes (especially in the early days)?</p><p>Breaking <a href=https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headlines target=_blank rel=noopener>Betteridge&rsquo;s law of headlines</a>, my answer to these questions is <em>yes</em>. I believe that many instances of data science projects exhibited classic software engineering mistakes, especially in the 2010s. Things appear to be getting better, though. The emergence of professions like data engineering, machine learning engineering, and analytics engineering represents a move away from getting data scientists to fail at software engineering – simply because they need to do less of it. But this isn&rsquo;t the case everywhere, as data maturity varies across organisations.</p><h2 id=failure-mode-examples>Failure mode examples<a hidden class=anchor aria-hidden=true href=#failure-mode-examples>#</a></h2><p>For a data science project to exhibit a failure mode of software engineering, it needs to: (1) have working software as one of its outcomes; and (2) fail in a way that software engineering projects are known to fail.</p><p>Not all data science projects meet my first criterion. Some projects end with a one-off report as their outcome, which is fine if that&rsquo;s the project&rsquo;s goal. However, many data science projects aim to deliver software systems that need to operate continuously and reliably. Quoting one of <a href=https://agilemanifesto.org/principles.html target=_blank rel=noopener>the principles behind the agile manifesto</a>, for projects of the latter type, <em>working software is the primary measure of progress</em>. My sense is that such projects were driving the data science hype, e.g., a personalisation system that automatically increases revenue is both more exciting and more valuable than a one-off report.</p><p>For my second criterion, I&rsquo;ll discuss some classic software engineering mistakes and how they manifest in data science projects. These come from <a href=https://www.construx.com/wp-content/uploads/2020/04/CxWhitePaper_ClassicMistakes.pdf target=_blank rel=noopener>a list compiled by Steve McConnell in 1996 and updated in 2008</a>. While some mistakes have become less common, many are still repeated to this day. As <a href=https://blog.codinghorror.com/escaping-from-gilligans-island/ target=_blank rel=noopener>Jeff Atwood noted in 2007</a>, <em>&ldquo;classic mistakes are classic because they&rsquo;re so seductive.&rdquo;</em> The updated list contains 42 mistakes, so I&rsquo;ll highlight five I find especially pertinent: <em>unrealistic expectations</em>, <em>heroics</em>, <em>research-oriented development</em>, <em>silver-bullet syndrome</em>, and <em>lack of automated source-code control</em>.</p><p><strong>(M1) Unrealistic expectations.</strong> This mistake had the highest exposure index in McConnell&rsquo;s 2008 report, meaning it&rsquo;s both frequent and severe. I don&rsquo;t have solid data on the occurrence of this mistake in data science projects, but unrealistic expectations go hand in hand with an over-hyped field. This is exemplified by the <a href=https://en.wikipedia.org/wiki/Gartner_hype_cycle target=_blank rel=noopener>Gartner hype cycle</a>, where technologies hit a <em>peak of inflated expectations</em> followed by a <em>trough of disillusionment</em>. While the general validity of the hype cycle model is questionable, I&rsquo;ve experienced enough instances of unrealistic expectations and heard enough stories to believe that many data science projects have not escaped this classic mistake.</p><figure><a href=gartner-hype-cycle.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="artificial intelligence,career,data science,software engineering"><meta name=description content="Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Was data science a failure mode of software engineering?"><meta property="og:description" content="Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/"><meta property="og:image" content="https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/data-science-software-engineering-failure.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2023-06-30T00:06:30+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/data-science-software-engineering-failure.jpg"><meta name=twitter:title content="Was data science a failure mode of software engineering?"><meta name=twitter:description content="Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Was data science a failure mode of software engineering?","item":"https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Was data science a failure mode of software engineering?","name":"Was data science a failure mode of software engineering?","description":"Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.","keywords":["artificial intelligence","career","data science","software engineering"],"articleBody":"The world was a different place in 2012. I had just finished my PhD, and I wasn’t sure what my title should be. My formal academic transcript said that I specialised in artificial intelligence, but back then it seemed silly to call myself an AI Expert. As I was interested in startups, I became the first employee of Giveable, and my title was Data Scientist. This was the year Harvard Business Review declared data scientist to be the sexiest job of the 21st century, so it suited me just fine. I got to do work I found interesting while reaping the benefits of being in an over-hyped profession.\nAs data science was a new term, I attempted to decipher its evolving meaning. In 2014, I liked the definition by Josh Wills, who saw it as the intersection of software engineering and statistics. By 2018, I came to see it as the union of many fields, with practitioners who support and drive decisions by employing descriptive analytics, predictive models, and causal inference. In 2020, I reflected on the trend of software commodities displacing interesting data science work. Now, I look back and wonder: Was data science a failure mode of software engineering? That is, did many data science projects repeat classic software engineering mistakes (especially in the early days)?\nBreaking Betteridge’s law of headlines, my answer to these questions is yes. I believe that many instances of data science projects exhibited classic software engineering mistakes, especially in the 2010s. Things appear to be getting better, though. The emergence of professions like data engineering, machine learning engineering, and analytics engineering represents a move away from getting data scientists to fail at software engineering – simply because they need to do less of it. But this isn’t the case everywhere, as data maturity varies across organisations.\nFailure mode examples For a data science project to exhibit a failure mode of software engineering, it needs to: (1) have working software as one of its outcomes; and (2) fail in a way that software engineering projects are known to fail.\nNot all data science projects meet my first criterion. Some projects end with a one-off report as their outcome, which is fine if that’s the project’s goal. However, many data science projects aim to deliver software systems that need to operate continuously and reliably. Quoting one of the principles behind the agile manifesto, for projects of the latter type, working software is the primary measure of progress. My sense is that such projects were driving the data science hype, e.g., a personalisation system that automatically increases revenue is both more exciting and more valuable than a one-off report.\nFor my second criterion, I’ll discuss some classic software engineering mistakes and how they manifest in data science projects. These come from a list compiled by Steve McConnell in 1996 and updated in 2008. While some mistakes have become less common, many are still repeated to this day. As Jeff Atwood noted in 2007, “classic mistakes are classic because they’re so seductive.” The updated list contains 42 mistakes, so I’ll highlight five I find especially pertinent: unrealistic expectations, heroics, research-oriented development, silver-bullet syndrome, and lack of automated source-code control.\n(M1) Unrealistic expectations. This mistake had the highest exposure index in McConnell’s 2008 report, meaning it’s both frequent and severe. I don’t have solid data on the occurrence of this mistake in data science projects, but unrealistic expectations go hand in hand with an over-hyped field. This is exemplified by the Gartner hype cycle, where technologies hit a peak of inflated expectations followed by a trough of disillusionment. While the general validity of the hype cycle model is questionable, I’ve experienced enough instances of unrealistic expectations and heard enough stories to believe that many data science projects have not escaped this classic mistake.\nGartner hype cycle. Source: Olga Tarkovskiy, CC BY-SA 3.0, via Wikimedia Commons. (M2) Heroics. This classic mistake is probably best exemplified by the labelling of data scientist as the sexiest job of the 21st century. Yes, it was just a Harvard Business Review article, but with almost 2,000 scholarly citations and numerous other mentions, it’s beyond doubt that it had helped paint a picture of data scientists as heroes who “understand how to fish out answers to important business questions from today’s tsunami of unstructured information”. More careful reading of the original article reveals that the authors referred to data scientist as “the hot job of the decade” (emphasis mine). Indeed, the same authors published a follow-up article in 2022 that implicitly follows Betteridge’s law of headlines: Is Data Scientist Still the Sexiest Job of the 21st Century? The 2022 article notes that “businesses now need to create and oversee diverse data science teams rather than searching for data scientist unicorns”, or in more detail:\nThe data science role is also now supplemented with a variety of other jobs. The assumption in 2012 was that data scientists could do all required tasks in a data science application — from conceptualizing the use case, to interfacing with business and technology stakeholders, to developing the algorithm and deploying it into production. Now, however, there has been a proliferation of related jobs to handle many of those tasks, including machine learning engineer, data engineer, AI specialist, analytics and AI translators, and data oriented product managers. LinkedIn reported some of these jobs as being more popular than data scientists in its “Jobs on the Rise” reports for 2021 and 2022 for the U.S.\nWhile I have my doubts about AI specialist ever becoming a well-defined profession, it seems like the days of “sexy” data science heroes are thankfully behind us.\n(M3) Research-oriented development. This classic mistake is possibly one of the top reasons so many data science projects had failed to make it to production. Leaning towards research was probably due to many early data scientists coming from academia (or having an academic fetish, as noted by one Reddit commenter). However, there’s a fine distinction to draw between research and experimentation. Research aims to expand the frontiers of knowledge, which is an expensive, high-risk activity that should be avoided by most organisations. By contrast, experimentation aims to uncover truths and opportunities within a limited area, e.g., optimising landing pages for a specific product. In many business domains, rigorous experimentation requires a robust software platform. Having worked on such a platform myself, I consider this type of experimentation to be a data science success story, even though some people see A/B testing and causal inference as less “sexy” than machine learning.\n(M4) Silver-bullet syndrome. McConnell’s description of this mistake should be familiar to anyone who’s been in tech for long enough:\nOn some projects, there is an over reliance on the advertised benefits of previously unused technologies, tools, or 3rd party applications and too little information about how well they would do in the current development environment. When project teams latch onto a single new methodology or new technology and expect it to solve their cost, schedule, or quality problems, they are inevitably disappointed.\nThe silver bullet may be data science, or it may be AI, large language models, big data, or blockchain. As long as humans are running things, it’s unlikely we’d run out of silver bullets. While many have warned that data science isn’t a silver bullet (e.g., see my early posts on data’s hierarchy of needs and avoiding premature hiring of data scientists), people still fell for it. And people still fall for the latest shiny thing. I’m not immune either, e.g., I still find ChatGPT to be transformative and believe that AI will keep changing our world. But I can’t put a timeline on global transformations, and I doubt that a single technology would solve all the world’s problems.\nWill big data ever make a resurgence as a silver bullet? (M5) Lack of automated source-code control. While McConnell’s 2008 report found this to be a low-frequency mistake, data science may have helped resuscitate it. This is due to multiple factors:\nUsing notebooks for development and experimentation. I use notebooks myself – they are popular for a reason. However, they don’t play well with source control systems without additional tooling. For example, it’s hard to collaborate on notebooks as people do on plain code – just try merging notebook changes from multiple authors for a bit of fun. Many data scientists came from fields where source control wasn’t common. This is probably decreasing now with Git being the standard source control system, but it wasn’t the case about a decade ago. Data transformations don’t always live under source control. For example, analytics flows might be buried in stored procedures or database views, or worse – copied around by analysts. Again, this is changing thanks to a growing awareness and the rise of tools like dbt, but not everyone has adopted the modern data stack yet. Related to a lack of source-code control is a lack of control over model versioning and data lineage, as it takes some experience to develop an appreciation of the need for versioning and reproducibility. Still, it’s easy to end up with a big mess even with an awareness of these issues and the best intentions. It’s hard to control everything.\nLearning from history while moving forward Data science is maturing. We’ve gone from a “sexy” field to people calling themselves “recovering data scientists”, saying goodbye to the field, and declaring that “there will be no data science job titles by 2029”. Anecdotally, it seems to me like data science is becoming less of a failure mode of software engineering – perhaps because data scientists are no longer expected to single-handedly ship complex software systems.\nPersonally, I still struggle with giving a concise title to what I do, just like in 2012. Data scientist has become a loaded term – some see data science as a cost centre that fails to deliver tangible results. As I’ve never stopped doing software engineering, I try to emphasise it by saying that I’m a full-stack data scientist and software engineer. This is a bit of a mouthful, and full-stack is also a loaded term, but it seems apt because I’ve shipped production code that ranges from old-school C on pre-Android phones through data pipelines to web applications. My main concern these days is putting my skills to good use, especially within climate tech and related areas. But as I’m freelancing, I went with Data \u0026 AI Consultant as my LinkedIn job title – maybe my PhD specialisation in AI wasn’t so silly after all…?\n","wordCount":"1758","inLanguage":"en","image":"https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/data-science-software-engineering-failure.jpg","datePublished":"2023-06-30T00:06:30Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Was data science a failure mode of software engineering?</h1><div class=post-meta><span title='2023-06-30 00:06:30 +0000 UTC'>June 30, 2023</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/data-science-software-engineering-failure_hu6b7664f523075193f9f11d79c1c9dcfa_143432_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/data-science-software-engineering-failure_hu6b7664f523075193f9f11d79c1c9dcfa_143432_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/data-science-software-engineering-failure_hu6b7664f523075193f9f11d79c1c9dcfa_143432_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/data-science-software-engineering-failure.jpg 1024w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/data-science-software-engineering-failure.jpg alt="Not sure what's going on here, but it came from an odd conversation on the topic with Bing. Seems apt." width=1024 height=678><p>Not sure what&rsquo;s going on here, but it came from an odd conversation on the topic with Bing. Seems apt.</p></figure><div class=post-content><p>The world was a different place in 2012. I had just finished <a href=https://yanirseroussi.com/phd-work/>my PhD</a>, and I wasn&rsquo;t sure what my title should be. My formal academic transcript said that I specialised in artificial intelligence, but back then it seemed silly to call myself an AI Expert. As I was interested in startups, I became the first employee of <a href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/>Giveable</a>, and my title was Data Scientist. This was the year <a href=https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century target=_blank rel=noopener>Harvard Business Review declared data scientist to be the sexiest job of the 21st century</a>, so it suited me just fine. I got to do work I found interesting while reaping the benefits of being in an over-hyped profession.</p><p>As data science was a new term, I attempted to decipher its evolving meaning. In 2014, <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>I liked the definition by Josh Wills</a>, who saw it as the intersection of software engineering and statistics. By 2018, I came to see it as <a href=https://data.blog/2018/03/20/engineering-data-science-at-automattic/ target=_blank rel=noopener>the union of many fields</a>, with <a href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/>practitioners who support and drive decisions by employing descriptive analytics, predictive models, and causal inference</a>. In 2020, I <a href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/>reflected on the trend of software commodities displacing interesting data science work</a>. Now, I look back and wonder: Was data science a failure mode of software engineering? That is, did many data science projects repeat classic software engineering mistakes (especially in the early days)?</p><p>Breaking <a href=https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headlines target=_blank rel=noopener>Betteridge&rsquo;s law of headlines</a>, my answer to these questions is <em>yes</em>. I believe that many instances of data science projects exhibited classic software engineering mistakes, especially in the 2010s. Things appear to be getting better, though. The emergence of professions like data engineering, machine learning engineering, and analytics engineering represents a move away from getting data scientists to fail at software engineering – simply because they need to do less of it. But this isn&rsquo;t the case everywhere, as data maturity varies across organisations.</p><h2 id=failure-mode-examples>Failure mode examples<a hidden class=anchor aria-hidden=true href=#failure-mode-examples>#</a></h2><p>For a data science project to exhibit a failure mode of software engineering, it needs to: (1) have working software as one of its outcomes; and (2) fail in a way that software engineering projects are known to fail.</p><p>Not all data science projects meet my first criterion. Some projects end with a one-off report as their outcome, which is fine if that&rsquo;s the project&rsquo;s goal. However, many data science projects aim to deliver software systems that need to operate continuously and reliably. Quoting one of <a href=https://agilemanifesto.org/principles.html target=_blank rel=noopener>the principles behind the agile manifesto</a>, for projects of the latter type, <em>working software is the primary measure of progress</em>. My sense is that such projects were driving the data science hype, e.g., a personalisation system that automatically increases revenue is both more exciting and more valuable than a one-off report.</p><p>For my second criterion, I&rsquo;ll discuss some classic software engineering mistakes and how they manifest in data science projects. These come from <a href=https://www.construx.com/wp-content/uploads/2020/04/CxWhitePaper_ClassicMistakes.pdf target=_blank rel=noopener>a list compiled by Steve McConnell in 1996 and updated in 2008</a>. While some mistakes have become less common, many are still repeated to this day. As <a href=https://blog.codinghorror.com/escaping-from-gilligans-island/ target=_blank rel=noopener>Jeff Atwood noted in 2007</a>, <em>&ldquo;classic mistakes are classic because they&rsquo;re so seductive.&rdquo;</em> The updated list contains 42 mistakes, so I&rsquo;ll highlight five I find especially pertinent: <em>unrealistic expectations</em>, <em>heroics</em>, <em>research-oriented development</em>, <em>silver-bullet syndrome</em>, and <em>lack of automated source-code control</em>.</p><p><strong>(M1) Unrealistic expectations.</strong> This mistake had the highest exposure index in McConnell&rsquo;s 2008 report, meaning it&rsquo;s both frequent and severe. I don&rsquo;t have solid data on the occurrence of this mistake in data science projects, but unrealistic expectations go hand in hand with an over-hyped field. This is exemplified by the <a href=https://en.wikipedia.org/wiki/Gartner_hype_cycle target=_blank rel=noopener>Gartner hype cycle</a>, where technologies hit a <em>peak of inflated expectations</em> followed by a <em>trough of disillusionment</em>. While the general validity of the hype cycle model is questionable, I&rsquo;ve experienced enough instances of unrealistic expectations and heard enough stories to believe that many data science projects have not escaped this classic mistake.</p><figure><a href=gartner-hype-cycle.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/gartner-hype-cycle_hu3069bca74bd842e837c8187d5549a703_102150_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/gartner-hype-cycle_hu3069bca74bd842e837c8187d5549a703_102150_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/gartner-hype-cycle_hu3069bca74bd842e837c8187d5549a703_102150_720x0_resize_box_3.png 720w,
diff --git a/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/index.html b/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/index.html
index fd05fd742..dc35d8d95 100644
--- a/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/index.html
+++ b/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>My rediscovery of quiet writing on the open web | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="blogging,personal"><meta name=description content="Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="My rediscovery of quiet writing on the open web"><meta property="og:description" content="Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/"><meta property="og:image" content="https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/rikoriko-cave.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2023-08-28T05:30:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/rikoriko-cave.jpg"><meta name=twitter:title content="My rediscovery of quiet writing on the open web"><meta name=twitter:description content="Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"My rediscovery of quiet writing on the open web","item":"https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"My rediscovery of quiet writing on the open web","name":"My rediscovery of quiet writing on the open web","description":"Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes.","keywords":["blogging","personal"],"articleBody":"I published my first post on this website almost ten years ago. My motivation was modest: Publicly link to useful stuff I wrote. Recently, I tidied up my homepage and added short descriptions to old posts, which made me reflect on how this website has changed over the years: From a quiet place for sharing some tips and progress reports, through B-list data science “influencing”, and back to a quiet place. This post summarises some of my reflections, in no particular order.\nViews are addictive and never enough. The first time a post I published had over a thousand views, I was excited. Then some of my posts had over ten thousand views, so a mere thousand became a disappointment. I didn’t intentionally optimise for views – it happened because I wrote on popular topics, and distributed my posts through channels that worked at the time. If I had chosen to optimise for popularity, I probably would have grown dissatisfied with view counts in the tens of thousands, and then with hundreds of thousands, and then with millions. Instead, I went backwards in terms of popularity and audience size: I deleted my Twitter account a few years ago because I found the platform unpleasant, and I lost half my followers when I migrated my site from WordPress.com to Hugo two years ago (I couldn’t port non-email followers who were using the proprietary WordPress.com Reader). More importantly, I often write about topics that may be of low appeal to current followers, and don’t invest much effort in getting the word out. I don’t even bother with accurate long-term tracking of views and interactions – I only use Cloudflare Analytics to validate that the website is still working.\nMeaningful engagement is more satisfying than views and likes. I removed the Like functionality when I got my website off WordPress.com. I don’t miss it, as likes have the same addictive “never enough” qualities as views. That said, views and likes are correlated with the amount of more satisfying engagement, which comes in the form of thoughtful comments and private messages. By reducing my distribution efforts, I have also reduced the amount of meaningful engagement, but such is life. I still have other motivators.\nWriting publicly helps me think. Even with a low number of views, the fact that practically anyone in the world can read something I wrote makes me take it more seriously. I put more effort into making myself clear and checking references than if I were to write for myself. In addition, the process of writing often becomes a process of discovery – as I write things down and add structure to a post, my subject becomes clearer to me.\nWriting publicly creates valuable documentation. Even if no one is reading right now, posts on this website remain accessible for years. I often link to my own writing – not (always) out of vanity, but because it’s relevant in a specific context. Recently, I started experimenting with easier-to-produce posts that I share under a today I learned (TIL) section – a format I learned about from following Simon Willison. So far, my TIL section is pretty much documentation for myself, as I put no effort into telling people about specific TIL posts. We’ll see how it goes in the long run.\nPlatform independence is awesome (if you have the right skills). Getting my website off WordPress.com a couple of years ago was a bit of a pain, but I love the extra control it gives me. On a platform like WordPress.com, I would have had to pay extra to do something like give all my posts short meta-descriptions and organise them on a single page, as I did recently. The same goes for setting up the TIL section, which was a breeze with Hugo. Being able to have fine-grained control over the rendered content and its structure works well for me, but it’s not for everyone (there’s a reason why a large portion of the web uses WordPress). Still, it has never been easier and cheaper to self-host a static site like mine.\nDurable tech works well for quiet writing. Legacy technologies tend to get a bad rap. Many people prefer building with shiny new tech on shiny new platforms. Publishing on the web is no exception, as trendy ways of sharing content come and go. Twenty years ago, most of today’s social media didn’t exist. How much of it will exist in twenty years? Making any prediction is hard, but I’m willing to bet that twenty years from now, there will still be tools that can serve and render my website (the HTML / CSS / JS output of Hugo), as it exists right now. I wouldn’t make the same bet on nascent social media platforms or on writing-centric platforms such as Substack. This is in line with the Lindy effect, which states that “the future life expectancy of some non-perishable things, like a technology or an idea, is proportional to their current age”. It often makes perfect sense to go for new tech, as it comes with new capabilities. I prefer to be cautious, as I want to focus on what I get out of writing rather than on bouncing between platforms and tools.\nSource: Ditcherville: I love your email list! CC BY-NC 4.0 by Jonathan Stark ","wordCount":"889","inLanguage":"en","image":"https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/rikoriko-cave.jpg","datePublished":"2023-08-28T05:30:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">My rediscovery of quiet writing on the open web</h1><div class=post-meta><span title='2023-08-28 05:30:00 +0000 UTC'>August 28, 2023</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/rikoriko-cave_hu6dc18de4dca0b72c45f8983b19275919_772988_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/rikoriko-cave_hu6dc18de4dca0b72c45f8983b19275919_772988_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/rikoriko-cave_hu6dc18de4dca0b72c45f8983b19275919_772988_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/rikoriko-cave.jpg 1024w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/rikoriko-cave.jpg alt="Rikoriko cave" width=1024 height=576></figure><div class=post-content><p>I published <a href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/>my first post on this website</a> almost ten years ago. My motivation was modest: Publicly link to useful stuff I wrote. Recently, I tidied up my homepage and added short descriptions to old posts, which made me reflect on how this website has changed over the years: From a quiet place for sharing some tips and progress reports, through B-list data science &ldquo;influencing&rdquo;, and back to a quiet place. This post summarises some of my reflections, in no particular order.</p><p><strong>Views are addictive and never enough.</strong> The first time a post I published had over a thousand views, <a href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/>I was excited</a>. Then some of my posts had over ten thousand views, so a mere thousand became a disappointment. I didn&rsquo;t intentionally optimise for views – it happened because I wrote on popular topics, and distributed my posts through channels that worked at the time. If I had chosen to optimise for popularity, I probably would have grown dissatisfied with view counts in the tens of thousands, and then with hundreds of thousands, and then with millions. Instead, I went backwards in terms of popularity and audience size: I deleted my Twitter account a few years ago because I found the platform unpleasant, and I lost half my followers when I <a href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/>migrated my site from WordPress.com to Hugo</a> two years ago (I couldn&rsquo;t port non-email followers who were using the proprietary WordPress.com Reader). More importantly, I often write about topics that may be of low appeal to current followers, and don&rsquo;t invest much effort in getting the word out. I don&rsquo;t even bother with accurate long-term tracking of views and interactions – I only use Cloudflare Analytics to validate that the website is still working.</p><p><strong>Meaningful engagement is more satisfying than views and likes.</strong> I removed the <em>Like</em> functionality when I got my website off WordPress.com. I don&rsquo;t miss it, as likes have the same addictive &ldquo;never enough&rdquo; qualities as views. That said, views and likes are correlated with the amount of more satisfying engagement, which comes in the form of thoughtful comments and private messages. By reducing my distribution efforts, I have also reduced the amount of meaningful engagement, but such is life. I still have other motivators.</p><p><strong>Writing publicly helps me think.</strong> Even with a low number of views, the fact that practically anyone in the world can read something I wrote makes me take it more seriously. I put more effort into making myself clear and checking references than if I were to write for myself. In addition, the process of writing often becomes a process of discovery – as I write things down and add structure to a post, my subject becomes clearer to me.</p><p><strong>Writing publicly creates valuable documentation.</strong> Even if no one is reading <em>right now</em>, posts on this website remain accessible for years. I often link to my own writing – not (always) out of vanity, but because it&rsquo;s relevant in a specific context. Recently, I started experimenting with easier-to-produce posts that I share under <a href=https://yanirseroussi.com/til/>a <em>today I learned</em> (TIL) section</a> – a format I learned about from following <a href=https://til.simonwillison.net/ target=_blank rel=noopener>Simon Willison</a>. So far, my TIL section is pretty much documentation for myself, as I put no effort into telling people about specific TIL posts. We&rsquo;ll see how it goes in the long run.</p><p><strong>Platform independence is awesome (if you have the right skills).</strong> Getting my website off WordPress.com a couple of years ago was a bit of a pain, but I love the extra control it gives me. On a platform like WordPress.com, I would have had to pay extra to do something like give all my posts short meta-descriptions and organise them on a single page, as I did recently. The same goes for setting up the TIL section, which <a href=https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/>was a breeze with Hugo</a>. Being able to have fine-grained control over the rendered content and its structure works well for me, but it&rsquo;s not for everyone (there&rsquo;s a reason why a large portion of the web uses WordPress). Still, it has never been easier and cheaper to self-host a static site like mine.</p><p><strong>Durable tech works well for quiet writing.</strong> Legacy technologies tend to get a bad rap. Many people prefer building with shiny new tech on shiny new platforms. Publishing on the web is no exception, as trendy ways of sharing content come and go. Twenty years ago, most of today&rsquo;s social media didn&rsquo;t exist. How much of it will exist in twenty years? Making any prediction is hard, but I&rsquo;m willing to bet that twenty years from now, there will still be tools that can serve and render my website (the HTML / CSS / JS output of Hugo), as it exists right now. I wouldn&rsquo;t make the same bet on nascent social media platforms or on writing-centric platforms such as Substack. This is in line with <a href=https://en.wikipedia.org/wiki/Lindy_effect target=_blank rel=noopener>the Lindy effect</a>, which states that <em>&ldquo;the future life expectancy of some non-perishable things, like a technology or an idea, is proportional to their current age&rdquo;</em>. It often makes perfect sense to go for new tech, as it comes with new capabilities. I prefer to be cautious, as I want to focus on what I get out of writing rather than on bouncing between platforms and tools.</p><figure><a href=ditcherville-39-i-love-your-email-list.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="blogging,personal"><meta name=description content="Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="My rediscovery of quiet writing on the open web"><meta property="og:description" content="Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/"><meta property="og:image" content="https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/rikoriko-cave.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2023-08-28T05:30:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/rikoriko-cave.jpg"><meta name=twitter:title content="My rediscovery of quiet writing on the open web"><meta name=twitter:description content="Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"My rediscovery of quiet writing on the open web","item":"https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"My rediscovery of quiet writing on the open web","name":"My rediscovery of quiet writing on the open web","description":"Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes.","keywords":["blogging","personal"],"articleBody":"I published my first post on this website almost ten years ago. My motivation was modest: Publicly link to useful stuff I wrote. Recently, I tidied up my homepage and added short descriptions to old posts, which made me reflect on how this website has changed over the years: From a quiet place for sharing some tips and progress reports, through B-list data science “influencing”, and back to a quiet place. This post summarises some of my reflections, in no particular order.\nViews are addictive and never enough. The first time a post I published had over a thousand views, I was excited. Then some of my posts had over ten thousand views, so a mere thousand became a disappointment. I didn’t intentionally optimise for views – it happened because I wrote on popular topics, and distributed my posts through channels that worked at the time. If I had chosen to optimise for popularity, I probably would have grown dissatisfied with view counts in the tens of thousands, and then with hundreds of thousands, and then with millions. Instead, I went backwards in terms of popularity and audience size: I deleted my Twitter account a few years ago because I found the platform unpleasant, and I lost half my followers when I migrated my site from WordPress.com to Hugo two years ago (I couldn’t port non-email followers who were using the proprietary WordPress.com Reader). More importantly, I often write about topics that may be of low appeal to current followers, and don’t invest much effort in getting the word out. I don’t even bother with accurate long-term tracking of views and interactions – I only use Cloudflare Analytics to validate that the website is still working.\nMeaningful engagement is more satisfying than views and likes. I removed the Like functionality when I got my website off WordPress.com. I don’t miss it, as likes have the same addictive “never enough” qualities as views. That said, views and likes are correlated with the amount of more satisfying engagement, which comes in the form of thoughtful comments and private messages. By reducing my distribution efforts, I have also reduced the amount of meaningful engagement, but such is life. I still have other motivators.\nWriting publicly helps me think. Even with a low number of views, the fact that practically anyone in the world can read something I wrote makes me take it more seriously. I put more effort into making myself clear and checking references than if I were to write for myself. In addition, the process of writing often becomes a process of discovery – as I write things down and add structure to a post, my subject becomes clearer to me.\nWriting publicly creates valuable documentation. Even if no one is reading right now, posts on this website remain accessible for years. I often link to my own writing – not (always) out of vanity, but because it’s relevant in a specific context. Recently, I started experimenting with easier-to-produce posts that I share under a today I learned (TIL) section – a format I learned about from following Simon Willison. So far, my TIL section is pretty much documentation for myself, as I put no effort into telling people about specific TIL posts. We’ll see how it goes in the long run.\nPlatform independence is awesome (if you have the right skills). Getting my website off WordPress.com a couple of years ago was a bit of a pain, but I love the extra control it gives me. On a platform like WordPress.com, I would have had to pay extra to do something like give all my posts short meta-descriptions and organise them on a single page, as I did recently. The same goes for setting up the TIL section, which was a breeze with Hugo. Being able to have fine-grained control over the rendered content and its structure works well for me, but it’s not for everyone (there’s a reason why a large portion of the web uses WordPress). Still, it has never been easier and cheaper to self-host a static site like mine.\nDurable tech works well for quiet writing. Legacy technologies tend to get a bad rap. Many people prefer building with shiny new tech on shiny new platforms. Publishing on the web is no exception, as trendy ways of sharing content come and go. Twenty years ago, most of today’s social media didn’t exist. How much of it will exist in twenty years? Making any prediction is hard, but I’m willing to bet that twenty years from now, there will still be tools that can serve and render my website (the HTML / CSS / JS output of Hugo), as it exists right now. I wouldn’t make the same bet on nascent social media platforms or on writing-centric platforms such as Substack. This is in line with the Lindy effect, which states that “the future life expectancy of some non-perishable things, like a technology or an idea, is proportional to their current age”. It often makes perfect sense to go for new tech, as it comes with new capabilities. I prefer to be cautious, as I want to focus on what I get out of writing rather than on bouncing between platforms and tools.\nSource: Ditcherville: I love your email list! CC BY-NC 4.0 by Jonathan Stark ","wordCount":"889","inLanguage":"en","image":"https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/rikoriko-cave.jpg","datePublished":"2023-08-28T05:30:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">My rediscovery of quiet writing on the open web</h1><div class=post-meta><span title='2023-08-28 05:30:00 +0000 UTC'>August 28, 2023</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/rikoriko-cave_hu6dc18de4dca0b72c45f8983b19275919_772988_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/rikoriko-cave_hu6dc18de4dca0b72c45f8983b19275919_772988_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/rikoriko-cave_hu6dc18de4dca0b72c45f8983b19275919_772988_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/rikoriko-cave.jpg 1024w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/rikoriko-cave.jpg alt="Rikoriko cave" width=1024 height=576></figure><div class=post-content><p>I published <a href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/>my first post on this website</a> almost ten years ago. My motivation was modest: Publicly link to useful stuff I wrote. Recently, I tidied up my homepage and added short descriptions to old posts, which made me reflect on how this website has changed over the years: From a quiet place for sharing some tips and progress reports, through B-list data science &ldquo;influencing&rdquo;, and back to a quiet place. This post summarises some of my reflections, in no particular order.</p><p><strong>Views are addictive and never enough.</strong> The first time a post I published had over a thousand views, <a href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/>I was excited</a>. Then some of my posts had over ten thousand views, so a mere thousand became a disappointment. I didn&rsquo;t intentionally optimise for views – it happened because I wrote on popular topics, and distributed my posts through channels that worked at the time. If I had chosen to optimise for popularity, I probably would have grown dissatisfied with view counts in the tens of thousands, and then with hundreds of thousands, and then with millions. Instead, I went backwards in terms of popularity and audience size: I deleted my Twitter account a few years ago because I found the platform unpleasant, and I lost half my followers when I <a href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/>migrated my site from WordPress.com to Hugo</a> two years ago (I couldn&rsquo;t port non-email followers who were using the proprietary WordPress.com Reader). More importantly, I often write about topics that may be of low appeal to current followers, and don&rsquo;t invest much effort in getting the word out. I don&rsquo;t even bother with accurate long-term tracking of views and interactions – I only use Cloudflare Analytics to validate that the website is still working.</p><p><strong>Meaningful engagement is more satisfying than views and likes.</strong> I removed the <em>Like</em> functionality when I got my website off WordPress.com. I don&rsquo;t miss it, as likes have the same addictive &ldquo;never enough&rdquo; qualities as views. That said, views and likes are correlated with the amount of more satisfying engagement, which comes in the form of thoughtful comments and private messages. By reducing my distribution efforts, I have also reduced the amount of meaningful engagement, but such is life. I still have other motivators.</p><p><strong>Writing publicly helps me think.</strong> Even with a low number of views, the fact that practically anyone in the world can read something I wrote makes me take it more seriously. I put more effort into making myself clear and checking references than if I were to write for myself. In addition, the process of writing often becomes a process of discovery – as I write things down and add structure to a post, my subject becomes clearer to me.</p><p><strong>Writing publicly creates valuable documentation.</strong> Even if no one is reading <em>right now</em>, posts on this website remain accessible for years. I often link to my own writing – not (always) out of vanity, but because it&rsquo;s relevant in a specific context. Recently, I started experimenting with easier-to-produce posts that I share under <a href=https://yanirseroussi.com/til/>a <em>today I learned</em> (TIL) section</a> – a format I learned about from following <a href=https://til.simonwillison.net/ target=_blank rel=noopener>Simon Willison</a>. So far, my TIL section is pretty much documentation for myself, as I put no effort into telling people about specific TIL posts. We&rsquo;ll see how it goes in the long run.</p><p><strong>Platform independence is awesome (if you have the right skills).</strong> Getting my website off WordPress.com a couple of years ago was a bit of a pain, but I love the extra control it gives me. On a platform like WordPress.com, I would have had to pay extra to do something like give all my posts short meta-descriptions and organise them on a single page, as I did recently. The same goes for setting up the TIL section, which <a href=https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/>was a breeze with Hugo</a>. Being able to have fine-grained control over the rendered content and its structure works well for me, but it&rsquo;s not for everyone (there&rsquo;s a reason why a large portion of the web uses WordPress). Still, it has never been easier and cheaper to self-host a static site like mine.</p><p><strong>Durable tech works well for quiet writing.</strong> Legacy technologies tend to get a bad rap. Many people prefer building with shiny new tech on shiny new platforms. Publishing on the web is no exception, as trendy ways of sharing content come and go. Twenty years ago, most of today&rsquo;s social media didn&rsquo;t exist. How much of it will exist in twenty years? Making any prediction is hard, but I&rsquo;m willing to bet that twenty years from now, there will still be tools that can serve and render my website (the HTML / CSS / JS output of Hugo), as it exists right now. I wouldn&rsquo;t make the same bet on nascent social media platforms or on writing-centric platforms such as Substack. This is in line with <a href=https://en.wikipedia.org/wiki/Lindy_effect target=_blank rel=noopener>the Lindy effect</a>, which states that <em>&ldquo;the future life expectancy of some non-perishable things, like a technology or an idea, is proportional to their current age&rdquo;</em>. It often makes perfect sense to go for new tech, as it comes with new capabilities. I prefer to be cautious, as I want to focus on what I get out of writing rather than on bouncing between platforms and tools.</p><figure><a href=ditcherville-39-i-love-your-email-list.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/ditcherville-39-i-love-your-email-list_hue4c7fc22fce0574ea830c25d91fbdb2a_253127_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/ditcherville-39-i-love-your-email-list_hue4c7fc22fce0574ea830c25d91fbdb2a_253127_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/ditcherville-39-i-love-your-email-list_hue4c7fc22fce0574ea830c25d91fbdb2a_253127_720x0_resize_box_3.png 720w,
diff --git a/2023/10/25/lessons-from-reluctant-data-engineering/index.html b/2023/10/25/lessons-from-reluctant-data-engineering/index.html
index ddc6d4caf..b1f0fb4fa 100644
--- a/2023/10/25/lessons-from-reluctant-data-engineering/index.html
+++ b/2023/10/25/lessons-from-reluctant-data-engineering/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Lessons from reluctant data engineering | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="career,data engineering,data science,software engineering"><meta name=description content="Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Lessons from reluctant data engineering"><meta property="og:description" content="Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/"><meta property="og:image" content="https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/yanir-seroussi-dataengbytes-brisbane-2023.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2023-10-25T04:45:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/yanir-seroussi-dataengbytes-brisbane-2023.webp"><meta name=twitter:title content="Lessons from reluctant data engineering"><meta name=twitter:description content="Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Lessons from reluctant data engineering","item":"https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Lessons from reluctant data engineering","name":"Lessons from reluctant data engineering","description":"Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had.","keywords":["career","data engineering","data science","software engineering"],"articleBody":"In May 2023, I submitted the following talk abstract to the Brisbane DataEngBytes conference.\nAs we all know, solid data engineering is essential to the success of data science and AI applications. And yet, people often get excited about fancy machine learning models and neglect the data engineering layer. This is totally understandable: playing with data in a throwaway notebook is more relaxing than dealing with a data pipeline that keeps finding ways to break in production.\nIn this talk, I’ll share lessons on data engineering from a data science perspective. Everywhere I’ve worked, from small start-ups to established companies, I’ve found that I had to do some data engineering if I wanted my work to ever get to production. While I’ve always been reluctant to do too much of it, my engineering background has placed me in a better position to do it than colleagues who started off as analysts and academics.\nYou could call my work full-stack data science, reluctant data engineering, or some other data \u0026 AI thing. Whatever it is, I hope that my talk will help us all play better with each other, across all layers of the data stack.\nAs I don’t identify as a data engineer and have never attended a DataEngBytes conference, I didn’t know whether my talk would fit the agenda. However, it seemed harmless to submit an abstract and see how it goes.\nWhen I got the acceptance notification and realised I had to turn my abstract into a coherent talk, I was a bit wary of lacking a good grasp of who’s in my audience. However, when the full agenda was published, I realised that the focus of the conference won’t be on arcane data engineering knowledge, given that one of the keynotes was titled “How The Full-Stack Data Scientist Is STILL The Sexiest Job”. It turned out that despite the name and tagline (“by data engineers, for data engineers”), DataEngBytes was a great event for all data professionals.\nHere’s the video of the talk (slides):\nQuick summary. I start off with a disclaimer, stating that I am not a data engineer. Then I show evidence that the market values data engineering more than data science, given the ratio of Data Engineer to Data Scientist job ads (x3 in the AU$100-150k compensation range; x4 in the AU$200k+ range).1 I follow that observation with another disclaimer, stating that some of my lessons may be obvious or better learnt the hard way (as I often have to learn and relearn lessons). Then I detail five chronologically ordered snippets and their corresponding lessons:\n2012: My first data science job, where we made mistakes around technology choice and premature optimisation. The lesson is that shiny tech ain’t always shiny. Like all lessons, this one ends with a quote that shows that what I learned wasn’t entirely new. The first quote is by Donald Knuth from 1974: “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” 2013: My first head of data science job, where we solved real scaling issues by following principles and adapting solutions to our situation. The lesson is that shiny tech can be transformative; but principles beat tools, which goes with a 1911 quote by Harrington Emerson: “As to methods, there may be a million and then some, but principles are few. The person who grasps principles can successfully select their own methods. The person who tries methods, ignoring principles, is sure to have trouble.”2 2015: My first enterprise consulting stint, where I experienced being a not-so-useful data scientist and working with some not-so-useful data engineers. This led me to dabble in “shadow IT” (a term I learned at the conference), and build a separate Python machine learning pipeline to work around various limitations. The lesson is that you should solve problems; don’t be the problem, or in the words of circa 2004 Google: “Focus on the user and all else will follow.” 2017: My first remote data science job, where I played around with many job functions across the data stack and went down various data rabbit holes. The lesson is to go deep; trust but verify, which goes with a 1999 quote by Eric S. Raymond: “Given enough eyeballs, all bugs are shallow.” 2022: My first committed climate and biodiversity moves (still a work in progress). The lesson is that tech \u0026 titles are tools; focus on what matters, but recall Rabbi Tarfon’s quote from almost two thousand years ago: “You are not obliged to complete the work, but neither are you free to desist from it.” The main takeaway from the talk is that data problems have human roots – and human solutions. This is because:\nHumans get excited by shiny tech… and produce transformative tech. Humans optimise prematurely… and when it makes sense. Humans can act as unreasonable blockers… and as the users we serve. Humans generate messy data… and clean it up. Humans get distracted by tools… and use them for beneficial ends. This is based on Seek searches for jobs advertised in July 2023. Given the limitations of Seek search, it’s not an accurate representation of the demand for each role, as the results included all ads that mentioned the terms. One could also argue that data engineers tend to change jobs more than data scientists, fuelling demand. Despite this, I think the results support the general message around the value of data engineering, especially as others have noted the need for 4-5 data engineers per data scientist in organisations with complex data engineering requirements. ↩︎\nEmerson referred to man rather than person in the original quote, but I took the liberty to make it gender-neutral and retain the original message. ↩︎\n","wordCount":"969","inLanguage":"en","image":"https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/yanir-seroussi-dataengbytes-brisbane-2023.webp","datePublished":"2023-10-25T04:45:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Lessons from reluctant data engineering</h1><div class=post-meta><span title='2023-10-25 04:45:00 +0000 UTC'>October 25, 2023</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/yanir-seroussi-dataengbytes-brisbane-2023_hu804925b1940ee0b95918a52a0d7d78df_87060_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/yanir-seroussi-dataengbytes-brisbane-2023_hu804925b1940ee0b95918a52a0d7d78df_87060_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/yanir-seroussi-dataengbytes-brisbane-2023.webp 676w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/yanir-seroussi-dataengbytes-brisbane-2023.webp alt="Yanir Seroussi presenting at DataEngBytes Brisbane 2023" width=676 height=450></figure><div class=post-content><p>In May 2023, I submitted the following talk abstract to the Brisbane <a href=https://dataengconf.com.au/ target=_blank rel=noopener>DataEngBytes</a> conference.</p><blockquote><p>As we all know, solid data engineering is essential to the success of data science and AI applications. And yet, people often get excited about fancy machine learning models and neglect the data engineering layer. This is totally understandable: playing with data in a throwaway notebook is more relaxing than dealing with a data pipeline that keeps finding ways to break in production.</p><p>In this talk, I&rsquo;ll share lessons on data engineering from a data science perspective. Everywhere I&rsquo;ve worked, from small start-ups to established companies, I&rsquo;ve found that I had to do some data engineering if I wanted my work to ever get to production. While I&rsquo;ve always been reluctant to do too much of it, my engineering background has placed me in a better position to do it than colleagues who started off as analysts and academics.</p><p>You could call my work full-stack data science, reluctant data engineering, or some other data & AI thing. Whatever it is, I hope that my talk will help us all play better with each other, across all layers of the data stack.</p></blockquote><p>As I don&rsquo;t identify as a data engineer and have never attended a DataEngBytes conference, I didn&rsquo;t know whether my talk would fit the agenda. However, it seemed harmless to submit an abstract and see how it goes.</p><p>When I got the acceptance notification and realised I had to turn my abstract into a coherent talk, I was a bit wary of lacking a good grasp of who&rsquo;s in my audience. However, when the full agenda was published, I realised that the focus of the conference won&rsquo;t be on arcane data engineering knowledge, given that one of the keynotes was titled <em>&ldquo;How The Full-Stack Data Scientist Is STILL The Sexiest Job&rdquo;</em>. It turned out that despite the name and tagline (<em>&ldquo;by data engineers, for data engineers&rdquo;</em>), DataEngBytes was a great event for all data professionals.</p><p>Here&rsquo;s the video of the talk (<a href=https://docs.google.com/presentation/d/100GiDkp3UKfQtWtxZOF4CaJWTuSYtkEYxkI0_INdqq8/edit target=_blank rel=noopener>slides</a>):</p><p><div style=position:relative;padding-bottom:56.25%;height:0;overflow:hidden><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen loading=eager referrerpolicy=strict-origin-when-cross-origin src="https://www.youtube.com/embed/NE6e7Xx7OLQ?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style=position:absolute;top:0;left:0;width:100%;height:100%;border:0 title="Talk video: Lessons from reluctant data engineering"></iframe></div><br><strong>Quick summary.</strong> I start off with a disclaimer, stating that I am not a data engineer. Then I show evidence that the market values data engineering more than data science, given the ratio of <em>Data Engineer</em> to <em>Data Scientist</em> job ads (x3 in the AU$100-150k compensation range; x4 in the AU$200k+ range).<sup id=fnref:1><a href=#fn:1 class=footnote-ref role=doc-noteref>1</a></sup> I follow that observation with another disclaimer, stating that some of my lessons may be obvious or better learnt the hard way (as I often have to learn and relearn lessons). Then I detail five chronologically ordered snippets and their corresponding lessons:</p><ol><li>2012: My first data science job, where we made mistakes around technology choice and premature optimisation. The lesson is that <strong>shiny tech ain&rsquo;t always shiny</strong>. Like all lessons, this one ends with a quote that shows that what I learned wasn&rsquo;t entirely new. The first quote is by Donald Knuth from 1974: <em>&ldquo;We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.&rdquo;</em></li><li>2013: My first head of data science job, where we solved real scaling issues by following principles and adapting solutions to our situation. The lesson is that <strong>shiny tech can be transformative; but principles beat tools</strong>, which goes with a 1911 quote by Harrington Emerson: <em>&ldquo;As to methods, there may be a million and then some, but principles are few. The person who grasps principles can successfully select their own methods. The person who tries methods, ignoring principles, is sure to have trouble.&rdquo;</em><sup id=fnref:2><a href=#fn:2 class=footnote-ref role=doc-noteref>2</a></sup></li><li>2015: My first enterprise consulting stint, where I experienced being a not-so-useful data scientist and working with some not-so-useful data engineers. This led me to dabble in &ldquo;shadow IT&rdquo; (a term I learned at the conference), and build a separate Python machine learning pipeline to work around various limitations. The lesson is that you should <strong>solve problems; don’t be the problem</strong>, or in the words of circa 2004 Google: <em>&ldquo;Focus on the user and all else will follow.&rdquo;</em></li><li>2017: My first remote data science job, where I played around with many job functions across the data stack and went down various data rabbit holes. The lesson is to <strong>go deep; trust but verify</strong>, which goes with a 1999 quote by Eric S. Raymond: <em>&ldquo;Given enough eyeballs, all bugs are shallow.&rdquo;</em></li><li>2022: My first committed climate and biodiversity moves (still a work in progress). The lesson is that <strong>tech & titles are tools; focus on what matters</strong>, but recall Rabbi Tarfon&rsquo;s quote from almost two thousand years ago: <em>&ldquo;You are not obliged to complete the work, but neither are you free to desist from it.&rdquo;</em></li></ol><p>The main takeaway from the talk is that <strong>data problems have human roots – and human solutions</strong>. This is because:</p><ul><li>Humans get excited by shiny tech&mldr; and produce transformative tech.</li><li>Humans optimise prematurely&mldr; and when it makes sense.</li><li>Humans can act as unreasonable blockers&mldr; and as the users we serve.</li><li>Humans generate messy data&mldr; and clean it up.</li><li>Humans get distracted by tools&mldr; and use them for beneficial ends.</li></ul><div class=footnotes role=doc-endnotes><hr><ol><li id=fn:1><p>This is based on <a href=https://www.seek.com.au/ target=_blank rel=noopener>Seek</a> searches for jobs advertised in July 2023. Given the limitations of Seek search, it&rsquo;s not an accurate representation of the demand for each role, as the results included all ads that <em>mentioned</em> the terms. One could also argue that data engineers tend to change jobs more than data scientists, fuelling demand. Despite this, I think the results support the general message around the value of data engineering, especially as <a href=https://www.oreilly.com/radar/data-engineers-vs-data-scientists/ target=_blank rel=noopener>others have noted the need for 4-5 data engineers per data scientist in organisations with complex data engineering requirements</a>.&#160;<a href=#fnref:1 class=footnote-backref role=doc-backlink>&#8617;&#xfe0e;</a></p></li><li id=fn:2><p>Emerson referred to <em>man</em> rather than <em>person</em> in the original quote, but I took the liberty to make it gender-neutral and retain the original message.&#160;<a href=#fnref:2 class=footnote-backref role=doc-backlink>&#8617;&#xfe0e;</a></p></li></ol></div></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-engineering/>Data Engineering</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Lessons from reluctant data engineering on x" href="https://x.com/intent/tweet/?text=Lessons%20from%20reluctant%20data%20engineering&amp;url=https%3a%2f%2fyanirseroussi.com%2f2023%2f10%2f25%2flessons-from-reluctant-data-engineering%2f&amp;hashtags=career%2cdataengineering%2cdatascience%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Lessons from reluctant data engineering on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2023%2f10%2f25%2flessons-from-reluctant-data-engineering%2f&amp;title=Lessons%20from%20reluctant%20data%20engineering&amp;summary=Lessons%20from%20reluctant%20data%20engineering&amp;source=https%3a%2f%2fyanirseroussi.com%2f2023%2f10%2f25%2flessons-from-reluctant-data-engineering%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Lessons from reluctant data engineering on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2023%2f10%2f25%2flessons-from-reluctant-data-engineering%2f&title=Lessons%20from%20reluctant%20data%20engineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Lessons from reluctant data engineering on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2023%2f10%2f25%2flessons-from-reluctant-data-engineering%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Lessons from reluctant data engineering on whatsapp" href="https://api.whatsapp.com/send?text=Lessons%20from%20reluctant%20data%20engineering%20-%20https%3a%2f%2fyanirseroussi.com%2f2023%2f10%2f25%2flessons-from-reluctant-data-engineering%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Lessons from reluctant data engineering on telegram" href="https://telegram.me/share/url?text=Lessons%20from%20reluctant%20data%20engineering&amp;url=https%3a%2f%2fyanirseroussi.com%2f2023%2f10%2f25%2flessons-from-reluctant-data-engineering%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Lessons from reluctant data engineering on ycombinator" href="https://news.ycombinator.com/submitlink?t=Lessons%20from%20reluctant%20data%20engineering&u=https%3a%2f%2fyanirseroussi.com%2f2023%2f10%2f25%2flessons-from-reluctant-data-engineering%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="career,data engineering,data science,software engineering"><meta name=description content="Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Lessons from reluctant data engineering"><meta property="og:description" content="Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/"><meta property="og:image" content="https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/yanir-seroussi-dataengbytes-brisbane-2023.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2023-10-25T04:45:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/yanir-seroussi-dataengbytes-brisbane-2023.webp"><meta name=twitter:title content="Lessons from reluctant data engineering"><meta name=twitter:description content="Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Lessons from reluctant data engineering","item":"https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Lessons from reluctant data engineering","name":"Lessons from reluctant data engineering","description":"Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had.","keywords":["career","data engineering","data science","software engineering"],"articleBody":"In May 2023, I submitted the following talk abstract to the Brisbane DataEngBytes conference.\nAs we all know, solid data engineering is essential to the success of data science and AI applications. And yet, people often get excited about fancy machine learning models and neglect the data engineering layer. This is totally understandable: playing with data in a throwaway notebook is more relaxing than dealing with a data pipeline that keeps finding ways to break in production.\nIn this talk, I’ll share lessons on data engineering from a data science perspective. Everywhere I’ve worked, from small start-ups to established companies, I’ve found that I had to do some data engineering if I wanted my work to ever get to production. While I’ve always been reluctant to do too much of it, my engineering background has placed me in a better position to do it than colleagues who started off as analysts and academics.\nYou could call my work full-stack data science, reluctant data engineering, or some other data \u0026 AI thing. Whatever it is, I hope that my talk will help us all play better with each other, across all layers of the data stack.\nAs I don’t identify as a data engineer and have never attended a DataEngBytes conference, I didn’t know whether my talk would fit the agenda. However, it seemed harmless to submit an abstract and see how it goes.\nWhen I got the acceptance notification and realised I had to turn my abstract into a coherent talk, I was a bit wary of lacking a good grasp of who’s in my audience. However, when the full agenda was published, I realised that the focus of the conference won’t be on arcane data engineering knowledge, given that one of the keynotes was titled “How The Full-Stack Data Scientist Is STILL The Sexiest Job”. It turned out that despite the name and tagline (“by data engineers, for data engineers”), DataEngBytes was a great event for all data professionals.\nHere’s the video of the talk (slides):\nQuick summary. I start off with a disclaimer, stating that I am not a data engineer. Then I show evidence that the market values data engineering more than data science, given the ratio of Data Engineer to Data Scientist job ads (x3 in the AU$100-150k compensation range; x4 in the AU$200k+ range).1 I follow that observation with another disclaimer, stating that some of my lessons may be obvious or better learnt the hard way (as I often have to learn and relearn lessons). Then I detail five chronologically ordered snippets and their corresponding lessons:\n2012: My first data science job, where we made mistakes around technology choice and premature optimisation. The lesson is that shiny tech ain’t always shiny. Like all lessons, this one ends with a quote that shows that what I learned wasn’t entirely new. The first quote is by Donald Knuth from 1974: “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” 2013: My first head of data science job, where we solved real scaling issues by following principles and adapting solutions to our situation. The lesson is that shiny tech can be transformative; but principles beat tools, which goes with a 1911 quote by Harrington Emerson: “As to methods, there may be a million and then some, but principles are few. The person who grasps principles can successfully select their own methods. The person who tries methods, ignoring principles, is sure to have trouble.”2 2015: My first enterprise consulting stint, where I experienced being a not-so-useful data scientist and working with some not-so-useful data engineers. This led me to dabble in “shadow IT” (a term I learned at the conference), and build a separate Python machine learning pipeline to work around various limitations. The lesson is that you should solve problems; don’t be the problem, or in the words of circa 2004 Google: “Focus on the user and all else will follow.” 2017: My first remote data science job, where I played around with many job functions across the data stack and went down various data rabbit holes. The lesson is to go deep; trust but verify, which goes with a 1999 quote by Eric S. Raymond: “Given enough eyeballs, all bugs are shallow.” 2022: My first committed climate and biodiversity moves (still a work in progress). The lesson is that tech \u0026 titles are tools; focus on what matters, but recall Rabbi Tarfon’s quote from almost two thousand years ago: “You are not obliged to complete the work, but neither are you free to desist from it.” The main takeaway from the talk is that data problems have human roots – and human solutions. This is because:\nHumans get excited by shiny tech… and produce transformative tech. Humans optimise prematurely… and when it makes sense. Humans can act as unreasonable blockers… and as the users we serve. Humans generate messy data… and clean it up. Humans get distracted by tools… and use them for beneficial ends. This is based on Seek searches for jobs advertised in July 2023. Given the limitations of Seek search, it’s not an accurate representation of the demand for each role, as the results included all ads that mentioned the terms. One could also argue that data engineers tend to change jobs more than data scientists, fuelling demand. Despite this, I think the results support the general message around the value of data engineering, especially as others have noted the need for 4-5 data engineers per data scientist in organisations with complex data engineering requirements. ↩︎\nEmerson referred to man rather than person in the original quote, but I took the liberty to make it gender-neutral and retain the original message. ↩︎\n","wordCount":"969","inLanguage":"en","image":"https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/yanir-seroussi-dataengbytes-brisbane-2023.webp","datePublished":"2023-10-25T04:45:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Lessons from reluctant data engineering</h1><div class=post-meta><span title='2023-10-25 04:45:00 +0000 UTC'>October 25, 2023</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/yanir-seroussi-dataengbytes-brisbane-2023_hu804925b1940ee0b95918a52a0d7d78df_87060_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/yanir-seroussi-dataengbytes-brisbane-2023_hu804925b1940ee0b95918a52a0d7d78df_87060_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/yanir-seroussi-dataengbytes-brisbane-2023.webp 676w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/yanir-seroussi-dataengbytes-brisbane-2023.webp alt="Yanir Seroussi presenting at DataEngBytes Brisbane 2023" width=676 height=450></figure><div class=post-content><p>In May 2023, I submitted the following talk abstract to the Brisbane <a href=https://dataengconf.com.au/ target=_blank rel=noopener>DataEngBytes</a> conference.</p><blockquote><p>As we all know, solid data engineering is essential to the success of data science and AI applications. And yet, people often get excited about fancy machine learning models and neglect the data engineering layer. This is totally understandable: playing with data in a throwaway notebook is more relaxing than dealing with a data pipeline that keeps finding ways to break in production.</p><p>In this talk, I&rsquo;ll share lessons on data engineering from a data science perspective. Everywhere I&rsquo;ve worked, from small start-ups to established companies, I&rsquo;ve found that I had to do some data engineering if I wanted my work to ever get to production. While I&rsquo;ve always been reluctant to do too much of it, my engineering background has placed me in a better position to do it than colleagues who started off as analysts and academics.</p><p>You could call my work full-stack data science, reluctant data engineering, or some other data & AI thing. Whatever it is, I hope that my talk will help us all play better with each other, across all layers of the data stack.</p></blockquote><p>As I don&rsquo;t identify as a data engineer and have never attended a DataEngBytes conference, I didn&rsquo;t know whether my talk would fit the agenda. However, it seemed harmless to submit an abstract and see how it goes.</p><p>When I got the acceptance notification and realised I had to turn my abstract into a coherent talk, I was a bit wary of lacking a good grasp of who&rsquo;s in my audience. However, when the full agenda was published, I realised that the focus of the conference won&rsquo;t be on arcane data engineering knowledge, given that one of the keynotes was titled <em>&ldquo;How The Full-Stack Data Scientist Is STILL The Sexiest Job&rdquo;</em>. It turned out that despite the name and tagline (<em>&ldquo;by data engineers, for data engineers&rdquo;</em>), DataEngBytes was a great event for all data professionals.</p><p>Here&rsquo;s the video of the talk (<a href=https://docs.google.com/presentation/d/100GiDkp3UKfQtWtxZOF4CaJWTuSYtkEYxkI0_INdqq8/edit target=_blank rel=noopener>slides</a>):</p><p><div style=position:relative;padding-bottom:56.25%;height:0;overflow:hidden><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen loading=eager referrerpolicy=strict-origin-when-cross-origin src="https://www.youtube.com/embed/NE6e7Xx7OLQ?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style=position:absolute;top:0;left:0;width:100%;height:100%;border:0 title="Talk video: Lessons from reluctant data engineering"></iframe></div><br><strong>Quick summary.</strong> I start off with a disclaimer, stating that I am not a data engineer. Then I show evidence that the market values data engineering more than data science, given the ratio of <em>Data Engineer</em> to <em>Data Scientist</em> job ads (x3 in the AU$100-150k compensation range; x4 in the AU$200k+ range).<sup id=fnref:1><a href=#fn:1 class=footnote-ref role=doc-noteref>1</a></sup> I follow that observation with another disclaimer, stating that some of my lessons may be obvious or better learnt the hard way (as I often have to learn and relearn lessons). Then I detail five chronologically ordered snippets and their corresponding lessons:</p><ol><li>2012: My first data science job, where we made mistakes around technology choice and premature optimisation. The lesson is that <strong>shiny tech ain&rsquo;t always shiny</strong>. Like all lessons, this one ends with a quote that shows that what I learned wasn&rsquo;t entirely new. The first quote is by Donald Knuth from 1974: <em>&ldquo;We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.&rdquo;</em></li><li>2013: My first head of data science job, where we solved real scaling issues by following principles and adapting solutions to our situation. The lesson is that <strong>shiny tech can be transformative; but principles beat tools</strong>, which goes with a 1911 quote by Harrington Emerson: <em>&ldquo;As to methods, there may be a million and then some, but principles are few. The person who grasps principles can successfully select their own methods. The person who tries methods, ignoring principles, is sure to have trouble.&rdquo;</em><sup id=fnref:2><a href=#fn:2 class=footnote-ref role=doc-noteref>2</a></sup></li><li>2015: My first enterprise consulting stint, where I experienced being a not-so-useful data scientist and working with some not-so-useful data engineers. This led me to dabble in &ldquo;shadow IT&rdquo; (a term I learned at the conference), and build a separate Python machine learning pipeline to work around various limitations. The lesson is that you should <strong>solve problems; don’t be the problem</strong>, or in the words of circa 2004 Google: <em>&ldquo;Focus on the user and all else will follow.&rdquo;</em></li><li>2017: My first remote data science job, where I played around with many job functions across the data stack and went down various data rabbit holes. The lesson is to <strong>go deep; trust but verify</strong>, which goes with a 1999 quote by Eric S. Raymond: <em>&ldquo;Given enough eyeballs, all bugs are shallow.&rdquo;</em></li><li>2022: My first committed climate and biodiversity moves (still a work in progress). The lesson is that <strong>tech & titles are tools; focus on what matters</strong>, but recall Rabbi Tarfon&rsquo;s quote from almost two thousand years ago: <em>&ldquo;You are not obliged to complete the work, but neither are you free to desist from it.&rdquo;</em></li></ol><p>The main takeaway from the talk is that <strong>data problems have human roots – and human solutions</strong>. This is because:</p><ul><li>Humans get excited by shiny tech&mldr; and produce transformative tech.</li><li>Humans optimise prematurely&mldr; and when it makes sense.</li><li>Humans can act as unreasonable blockers&mldr; and as the users we serve.</li><li>Humans generate messy data&mldr; and clean it up.</li><li>Humans get distracted by tools&mldr; and use them for beneficial ends.</li></ul><div class=footnotes role=doc-endnotes><hr><ol><li id=fn:1><p>This is based on <a href=https://www.seek.com.au/ target=_blank rel=noopener>Seek</a> searches for jobs advertised in July 2023. Given the limitations of Seek search, it&rsquo;s not an accurate representation of the demand for each role, as the results included all ads that <em>mentioned</em> the terms. One could also argue that data engineers tend to change jobs more than data scientists, fuelling demand. Despite this, I think the results support the general message around the value of data engineering, especially as <a href=https://www.oreilly.com/radar/data-engineers-vs-data-scientists/ target=_blank rel=noopener>others have noted the need for 4-5 data engineers per data scientist in organisations with complex data engineering requirements</a>.&#160;<a href=#fnref:1 class=footnote-backref role=doc-backlink>&#8617;&#xfe0e;</a></p></li><li id=fn:2><p>Emerson referred to <em>man</em> rather than <em>person</em> in the original quote, but I took the liberty to make it gender-neutral and retain the original message.&#160;<a href=#fnref:2 class=footnote-backref role=doc-backlink>&#8617;&#xfe0e;</a></p></li></ol></div></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-engineering/>Data Engineering</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Lessons from reluctant data engineering on x" href="https://x.com/intent/tweet/?text=Lessons%20from%20reluctant%20data%20engineering&amp;url=https%3a%2f%2fyanirseroussi.com%2f2023%2f10%2f25%2flessons-from-reluctant-data-engineering%2f&amp;hashtags=career%2cdataengineering%2cdatascience%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Lessons from reluctant data engineering on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2023%2f10%2f25%2flessons-from-reluctant-data-engineering%2f&amp;title=Lessons%20from%20reluctant%20data%20engineering&amp;summary=Lessons%20from%20reluctant%20data%20engineering&amp;source=https%3a%2f%2fyanirseroussi.com%2f2023%2f10%2f25%2flessons-from-reluctant-data-engineering%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Lessons from reluctant data engineering on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2023%2f10%2f25%2flessons-from-reluctant-data-engineering%2f&title=Lessons%20from%20reluctant%20data%20engineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Lessons from reluctant data engineering on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2023%2f10%2f25%2flessons-from-reluctant-data-engineering%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Lessons from reluctant data engineering on whatsapp" href="https://api.whatsapp.com/send?text=Lessons%20from%20reluctant%20data%20engineering%20-%20https%3a%2f%2fyanirseroussi.com%2f2023%2f10%2f25%2flessons-from-reluctant-data-engineering%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Lessons from reluctant data engineering on telegram" href="https://telegram.me/share/url?text=Lessons%20from%20reluctant%20data%20engineering&amp;url=https%3a%2f%2fyanirseroussi.com%2f2023%2f10%2f25%2flessons-from-reluctant-data-engineering%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Lessons from reluctant data engineering on ycombinator" href="https://news.ycombinator.com/submitlink?t=Lessons%20from%20reluctant%20data%20engineering&u=https%3a%2f%2fyanirseroussi.com%2f2023%2f10%2f25%2flessons-from-reluctant-data-engineering%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/index.html b/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/index.html
index fe121601a..5458a6b38 100644
--- a/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/index.html
+++ b/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Supporting volunteer monitoring of marine biodiversity with modern web and data tools | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data engineering,data visualisation,machine learning,marine science,Reef Life Survey,software engineering,web development"><meta name=description content="Summarising the work Uri Seroussi and I did to improve Reef Life Survey&rsquo;s Reef Species of the World app."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Supporting volunteer monitoring of marine biodiversity with modern web and data tools"><meta property="og:description" content="Summarising the work Uri Seroussi and I did to improve Reef Life Survey&rsquo;s Reef Species of the World app."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/"><meta property="og:image" content="https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2023-11-29T02:00:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot.webp"><meta name=twitter:title content="Supporting volunteer monitoring of marine biodiversity with modern web and data tools"><meta name=twitter:description content="Summarising the work Uri Seroussi and I did to improve Reef Life Survey&rsquo;s Reef Species of the World app."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Supporting volunteer monitoring of marine biodiversity with modern web and data tools","item":"https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Supporting volunteer monitoring of marine biodiversity with modern web and data tools","name":"Supporting volunteer monitoring of marine biodiversity with modern web and data tools","description":"Summarising the work Uri Seroussi and I did to improve Reef Life Survey\u0026rsquo;s Reef Species of the World app.","keywords":["data engineering","data visualisation","machine learning","marine science","Reef Life Survey","software engineering","web development"],"articleBody":"I’ve been volunteering with the Reef Life Survey (RLS) citizen science project since 2015. RLS volunteers follow the same underwater visual census methodology that has been in use for decades, thereby producing data series that help inform the management of marine ecosystems. In simpler terms, we count fish (and some invertebrates), and this helps various organisations know what’s happening underwater. Among other places, RLS data has been used in scientific publications in Nature and elsewhere, and to inform the management of Australian marine parks.\nOver the years, I created a few online tools to help volunteers with survey work. These included web apps to visualise survey results and study species, as well as infer species from underwater photos. More recently, I agreed to help with the general maintenance of the non-WordPress parts of the RLS website and backend (somewhat reluctantly, but I suppose that’s what happens when you do things out of love).\nTaking greater responsibility to help the tech side of RLS along with an alignment of the research grant stars led to an opportunity to revamp the Reef Species of the World (RSoW) section – a collection of over 5,000 species with in-situ photos, descriptions, and empirical distributions derived from RLS surveys. My focus in this project was on product management, data pipelines, and backend work. I was joined by my brother, Uri Seroussi, who was in charge of front-end development (which became much more substantial than in the original RSoW).\nThe original RSoW was a traditional PHP application that relied on a MySQL database to serve requests, with most of the HTML constructed on the server. By contrast, we re-architected the new RSoW as a progressive web app using Next.js, which has the following advantages and new features:\nFully static site: served faster with reduced server load. Faster search and navigation: happens on the front-end without round-trips to the server. Installable app with offline availability: RSoW can now be installed as a mobile or desktop app, and run without an internet connection. Client-side image classification: offline availability includes image classification in the browser, which is useful when surveying in remote areas. Replacement of previous tools and pipelines: providing a more consistent user experience and improved data reliability. The rest of this post provides details on the architecture and implementation of the new RSoW and its underlying data and machine learning pipelines. But the best way of getting a feel for the data and the tools is to have a play yourself.\nThe new RSoW architecture diagram reflects the compromises between rebuilding and retaining legacy systems The RSoW web app We didn’t start with a blank slate: RSoW was already a public website, with many individual species pages ranking well on web searches (the main source of traffic). As such, a guiding principle was to retain as much of the original functionality as possible, and then build new features on top of it.\nWhen approaching a legacy codebase, there’s always the question of whether rebuilding parts or all of it is a worthwhile endeavour. As Jason Cohen notes, a more apt name for “legacy code” is “revenue code”, i.e., the code that embodies all the original and changed requirements, and has withstood the test of time. Even though RLS’s code isn’t meant to generate revenue, it’s always easy to mess things up when re-implementing existing functionality.\nThe main reasons we decided on a rewrite of the front-end were:\nUser experience: Speed things up, as some species searches were pretty slow due to server round-trips and inefficient database queries. Extensibility: Make it easier to add new features. Offline availability: This is impossible with a traditional PHP back-end, but feasible if all the data and code gets shipped to the client. We chose Next.js as the front-end framework since it’s well-established and supports static exports. Parts of the RLS website run on WordPress, so it’s easy to add statically-generated pages and serve them efficiently via Cloudflare (I wasn’t keen on complicating the stack by adding a Node backend). With static exports, we regenerate all the species pages whenever the data changes, which means that end-user page requests don’t need to touch the database. In addition, the main search page downloads three JSONs with all the data it needs to perform any species search (see sites.json, species.json, and surveys.json in the rls-data repo). Minified and compressed, these JSONs add up to less than 2MB of data, which isn’t tiny, but it is a small price to pay to avoid hitting the database. The JSONs also cache well on Cloudflare, like the rest of the web app’s files.\nFrom a user perspective, replicating the original functionality was the less exciting part of the project. Faster and less buggy code is obviously better, but once feature parity was achieved, we turned our attention to some new features:\nSupporting offline availability and installation by turning RSoW into a progressive web app: On its face, this was supposed to be simple given the next-pwa package, but it turned out to be a bit tricky because the original package was abandoned, and due to multiple layers of caching. It’s well-known that cache invalidation is one of the two hard problems in computer science (along with naming things and off-by-one errors), and progressive web apps offer a lovely variety of caches to deal with – everything needs to be cached on the client for offline availability. We got there after some tinkering and dealing with head-scratching bugs, some of which were caused by other caching layers in addition to the client-side caches (including Cloudflare and some misconfiguration of an early version of the app). Knowledge test: A separate grant came along and Uri had the opportunity to extend RSoW by adding a section that helps test new volunteers ahead of them joining RLS. Species frequency exploration: Bringing in the full functionality from the first tool I built for RLS back in 2017. Client-side image classification: Deprecating the Streamlit app I built a couple of years ago. Data and machine learning pipelines On the back-end, there was an opportunity to simplify things by retiring the original PHP code that processed survey data in favour of the pipelines I implemented in the rls-data repo. Ultimately, survey data comes from the Australian Ocean Data Network (AODN), which holds many more datasets in addition to RLS. Originally, the PHP code that processed survey data into the MySQL database evolved separately from rls-data, which I implemented to generate JSONs for the tools I built. As rls-data is an open source project and the raw survey data is relatively small (\u003c1GB), it made sense to process it with a daily GitHub Action (GHA) script that runs for free. The resultant JSONs are committed to the repo, which means that any unexpected changes are easily tracked (I keep an eye on the commits). It was simple to expand the existing rls-data pipelines to generate all the JSONs needed to serve RSoW, and then say goodbye to the PHP code that implemented similar functionality.\nI’m aware that running data pipelines with GitHub Actions isn’t going to win any awards for sophistication, but it’s a great fit for this project. The key principle is to use the right tool for the job, not the shiniest tool.\nOne part of the original RSoW that we barely touched was the management interface, which allows RLS admins to update species data and upload pictures. The gains from replacing the admin part of RSoW would have been negligible, so it still runs the old PHP code on top of MySQL. Unfortunately, this meant I couldn’t retire all the PHP data pipelines, as species data also comes from the Australian Ocean Data Network and is joined with the edits made by RLS admins. This exemplifies the pragmatism that one often needs to apply when faced with legacy revenue systems: If a system works and there’s no real benefit to replacing it, sticking with the old system is the right thing to do (even if it makes your architecture diagram more complicated).\nI have big plans to improve the machine learning model for inferring RLS species from user images, but it’s somehow never a priority. For RSoW, I did make it a priority to support serving the model with a simple API, but then I decided it’d be worth the effort to export it to ONNX for client-side image classification. This was partly driven by curiosity about ONNX, but it also had two key benefits: (1) support for offline classification; and (2) simplified \u0026 cheaper serving architecture, as ONNX models can be served from S3 and don’t require RLS to pay for server-side compute.\nAs to the machine learning pipelines, they all need to be manually triggered, which is fine since the image data changes slowly. These pipelines are implemented in notebooks and the command-line interface of the ichthywhat repo. I have a bit of a dream of this being an early precursor to complete automation of RLS data collection, with the historical RLS data series continued by divers who would mostly serve as video takers and fish scarers (using cameras without human divers would lead to different biases in the data). However, this is a big project that is probably best left to my next PhD, i.e., it may never happen.\nIn the meantime, I hope to continue diving with RLS, and aim to make pragmatic decisions to keep RSoW running and supporting the community.\n","wordCount":"1574","inLanguage":"en","image":"https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot.webp","datePublished":"2023-11-29T02:00:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Supporting volunteer monitoring of marine biodiversity with modern web and data tools</h1><div class=post-meta><span title='2023-11-29 02:00:00 +0000 UTC'>November 29, 2023</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot_huf880d5e70f6cbabeaf9d4b27c6c21664_34088_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot_huf880d5e70f6cbabeaf9d4b27c6c21664_34088_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot_huf880d5e70f6cbabeaf9d4b27c6c21664_34088_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot_huf880d5e70f6cbabeaf9d4b27c6c21664_34088_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot.webp alt="Screenshot of Reef Species of the World" width=1200 height=591></figure><div class=post-content><p>I&rsquo;ve been volunteering with the <a href=https://reeflifesurvey.com/ target=_blank rel=noopener>Reef Life Survey</a> (RLS) citizen science project since 2015. RLS volunteers follow the same underwater visual census methodology that has been in use for decades, thereby producing data series that help inform the management of marine ecosystems. In simpler terms, we count fish (and some invertebrates), and this helps various organisations know what&rsquo;s happening underwater. Among other places, RLS data has been used in scientific publications in <a href=https://www.nature.com/articles/s41586-023-05833-y target=_blank rel=noopener>Nature</a> and <a href=https://reeflifesurvey.com/scientific-papers-management-reports/ target=_blank rel=noopener>elsewhere</a>, and to inform the <a href=https://parksaustralia.gov.au/marine/science/reef-life-survey/ target=_blank rel=noopener>management of Australian marine parks</a>.</p><p>Over the years, I created a few online tools to help volunteers with survey work. These included web apps to <a href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/>visualise survey results and study species</a>, as well as <a href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/>infer species from underwater photos</a>. More recently, I agreed to help with the general maintenance of the non-WordPress parts of the RLS website and backend (<a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>somewhat reluctantly</a>, but I suppose that&rsquo;s what happens when you do things out of love).</p><p>Taking greater responsibility to help the tech side of RLS along with an alignment of the research grant stars led to an opportunity to revamp the <a href=https://reeflifesurvey.com/species/ target=_blank rel=noopener>Reef Species of the World</a> (RSoW) section – a collection of over 5,000 species with in-situ photos, descriptions, and empirical distributions derived from RLS surveys. My focus in this project was on product management, data pipelines, and backend work. I was joined by my brother, <a href=https://www.uriseroussi.com/ target=_blank rel=noopener>Uri Seroussi</a>, who was in charge of front-end development (which became much more substantial than in the original RSoW).</p><p>The original RSoW was a traditional PHP application that relied on a MySQL database to serve requests, with most of the HTML constructed on the server. By contrast, we re-architected the new RSoW as a progressive web app using Next.js, which has the following advantages and new features:</p><ul><li><strong>Fully static site:</strong> served faster with reduced server load.</li><li><strong>Faster search and navigation:</strong> happens on the front-end without round-trips to the server.</li><li><strong>Installable app with offline availability:</strong> RSoW can now be installed as a mobile or desktop app, and run without an internet connection.</li><li><strong>Client-side image classification:</strong> offline availability includes image classification in the browser, which is useful when surveying in remote areas.</li><li><strong>Replacement of previous tools and pipelines:</strong> providing a more consistent user experience and improved data reliability.</li></ul><p>The rest of this post provides details on the architecture and implementation of the new RSoW and its underlying data and machine learning pipelines. But the best way of getting a feel for the data and the tools is <a href=https://reeflifesurvey.com/species/ target=_blank rel=noopener>to have a play yourself</a>.</p><figure><a href=reef-species-of-the-world-architecture.svg target=_blank rel=noopener><img src=reef-species-of-the-world-architecture.svg alt="The new RSoW architecture diagram reflects the compromises between rebuilding and retaining legacy systems" loading=lazy></a><figcaption><p>The new RSoW architecture diagram reflects the compromises between rebuilding and retaining legacy systems</p></figcaption></figure><h2 id=the-rsow-web-app>The RSoW web app<a hidden class=anchor aria-hidden=true href=#the-rsow-web-app>#</a></h2><p>We didn&rsquo;t start with a blank slate: RSoW was already a public website, with many individual species pages ranking well on web searches (the main source of traffic). As such, a guiding principle was to retain as much of the original functionality as possible, and then build new features on top of it.</p><p>When approaching a legacy codebase, there&rsquo;s always the question of whether rebuilding parts or all of it is a worthwhile endeavour. As Jason Cohen notes, <a href=https://longform.asmartbear.com/scale target=_blank rel=noopener>a more apt name for &ldquo;legacy code&rdquo; is &ldquo;revenue code&rdquo;</a>, i.e., the code that embodies all the original and changed requirements, and has withstood the test of time. Even though RLS&rsquo;s code isn&rsquo;t meant to generate revenue, it&rsquo;s always easy to mess things up when re-implementing existing functionality.</p><p>The main reasons we decided on a rewrite of the front-end were:</p><ul><li><strong>User experience:</strong> Speed things up, as some species searches were pretty slow due to server round-trips and inefficient database queries.</li><li><strong>Extensibility:</strong> Make it easier to add new features.</li><li><strong>Offline availability:</strong> This is impossible with a traditional PHP back-end, but feasible if all the data and code gets shipped to the client.</li></ul><p>We chose Next.js as the front-end framework since it&rsquo;s well-established and supports static exports. Parts of the RLS website run on WordPress, so it&rsquo;s easy to add statically-generated pages and serve them efficiently via Cloudflare (I wasn&rsquo;t keen on complicating the stack by adding a Node backend). With static exports, we regenerate all the species pages whenever the data changes, which means that end-user page requests don&rsquo;t need to touch the database. In addition, the main search page downloads three JSONs with all the data it needs to perform any species search (see <code>sites.json</code>, <code>species.json</code>, and <code>surveys.json</code> in <a href=https://github.com/yanirs/rls-data/tree/master/output target=_blank rel=noopener>the <code>rls-data</code> repo</a>). Minified and compressed, these JSONs add up to less than 2MB of data, which isn&rsquo;t tiny, but it is a small price to pay to avoid hitting the database. The JSONs also cache well on Cloudflare, like the rest of the web app&rsquo;s files.</p><p>From a user perspective, replicating the original functionality was the less exciting part of the project. Faster and less buggy code is obviously better, but once feature parity was achieved, we turned our attention to some new features:</p><ul><li><strong>Supporting offline availability and installation</strong> by turning RSoW into a <a href=https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps target=_blank rel=noopener>progressive web app</a>: On its face, this was supposed to be simple given the <code>next-pwa</code> package, but it turned out to be a bit tricky because the original package was abandoned, and due to multiple layers of caching. It&rsquo;s well-known that <a href=https://martinfowler.com/bliki/TwoHardThings.html target=_blank rel=noopener>cache invalidation is one of the two hard problems in computer science</a> (along with naming things and off-by-one errors), and progressive web apps offer a lovely variety of caches to deal with – everything needs to be cached on the client for offline availability. We got there after some tinkering and dealing with head-scratching bugs, some of which were caused by other caching layers in addition to the client-side caches (including Cloudflare and some misconfiguration of an early version of the app).</li><li><strong>Knowledge test</strong>: A separate grant came along and Uri had the opportunity to extend RSoW by adding a section that helps test new volunteers ahead of them joining RLS.</li><li><strong>Species frequency exploration</strong>: Bringing in the full functionality from <a href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/>the first tool I built for RLS back in 2017</a>.</li><li><strong>Client-side image classification</strong>: Deprecating <a href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/>the Streamlit app I built a couple of years ago</a>.</li></ul><h2 id=data-and-machine-learning-pipelines>Data and machine learning pipelines<a hidden class=anchor aria-hidden=true href=#data-and-machine-learning-pipelines>#</a></h2><p>On the back-end, there was an opportunity to simplify things by retiring the original PHP code that processed survey data in favour of the pipelines I implemented in <a href=https://github.com/yanirs/rls-data/ target=_blank rel=noopener>the <code>rls-data</code> repo</a>. Ultimately, survey data comes from the <a href=https://portal.aodn.org.au/ target=_blank rel=noopener>Australian Ocean Data Network</a> (AODN), which holds many more datasets in addition to RLS. Originally, the PHP code that processed survey data into the MySQL database evolved separately from <code>rls-data</code>, which I implemented to generate JSONs for the tools I built. As <code>rls-data</code> is an open source project and the raw survey data is relatively small (&lt;1GB), it made sense to process it with a daily GitHub Action (GHA) script that runs for free. The resultant JSONs are committed to the repo, which means that any unexpected changes are easily tracked (<a href=https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/>I keep an eye on the commits</a>). It was simple to expand the existing <code>rls-data</code> pipelines to generate all the JSONs needed to serve RSoW, and then say goodbye to the PHP code that implemented similar functionality.</p><p>I&rsquo;m aware that running data pipelines with GitHub Actions isn&rsquo;t going to win any awards for sophistication, but it&rsquo;s a great fit for this project. The key principle is to use the right tool for the job, not the shiniest tool.</p><p>One part of the original RSoW that we barely touched was the management interface, which allows RLS admins to update species data and upload pictures. The gains from replacing the admin part of RSoW would have been negligible, so it still runs the old PHP code on top of MySQL. Unfortunately, this meant I couldn&rsquo;t retire all the PHP data pipelines, as species data also comes from the Australian Ocean Data Network and is joined with the edits made by RLS admins. This exemplifies the pragmatism that one often needs to apply when faced with <strike>legacy</strike> revenue systems: If a system works and there&rsquo;s no real benefit to replacing it, sticking with the old system is the right thing to do (even if it makes your architecture diagram more complicated).</p><p>I have <a href=https://github.com/yanirs/ichthywhat/issues/3 target=_blank rel=noopener>big plans</a> to improve the machine learning model for inferring RLS species from user images, but it&rsquo;s somehow never a priority. For RSoW, I did make it a priority to support <a href=https://github.com/yanirs/ichthywhat/pull/11 target=_blank rel=noopener>serving the model with a simple API</a>, but then I decided it&rsquo;d be worth the effort to <a href=https://github.com/yanirs/ichthywhat/pull/20 target=_blank rel=noopener>export it to ONNX for client-side image classification</a>. This was partly driven by curiosity about <a href=https://onnx.ai/ target=_blank rel=noopener>ONNX</a>, but it also had two key benefits: (1) support for offline classification; and (2) simplified & cheaper serving architecture, as ONNX models can be served from S3 and don&rsquo;t require RLS to pay for server-side compute.</p><p>As to the machine learning pipelines, they all need to be manually triggered, which is fine since the image data changes slowly. These pipelines are implemented in <a href=https://github.com/yanirs/ichthywhat target=_blank rel=noopener>notebooks and the command-line interface of the <code>ichthywhat</code> repo</a>. I have a bit of a dream of this being an early precursor to complete automation of RLS data collection, with the historical RLS data series continued by divers who would mostly serve as video takers and fish scarers (using cameras without human divers would lead to different biases in the data). However, this is a big project that is probably best left to my next PhD, i.e., it may never happen.</p><p>In the meantime, I hope to continue diving with RLS, and aim to make pragmatic decisions to keep RSoW running and supporting the community.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-engineering/>Data Engineering</a></li><li><a href=https://yanirseroussi.com/tags/data-visualisation/>Data Visualisation</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/marine-science/>Marine Science</a></li><li><a href=https://yanirseroussi.com/tags/reef-life-survey/>Reef Life Survey</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li><li><a href=https://yanirseroussi.com/tags/web-development/>Web Development</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Supporting volunteer monitoring of marine biodiversity with modern web and data tools on x" href="https://x.com/intent/tweet/?text=Supporting%20volunteer%20monitoring%20of%20marine%20biodiversity%20with%20modern%20web%20and%20data%20tools&amp;url=https%3a%2f%2fyanirseroussi.com%2f2023%2f11%2f29%2fsupporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools%2f&amp;hashtags=dataengineering%2cdatavisualisation%2cmachinelearning%2cmarinescience%2cReefLifeSurvey%2csoftwareengineering%2cwebdevelopment"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Supporting volunteer monitoring of marine biodiversity with modern web and data tools on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2023%2f11%2f29%2fsupporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools%2f&amp;title=Supporting%20volunteer%20monitoring%20of%20marine%20biodiversity%20with%20modern%20web%20and%20data%20tools&amp;summary=Supporting%20volunteer%20monitoring%20of%20marine%20biodiversity%20with%20modern%20web%20and%20data%20tools&amp;source=https%3a%2f%2fyanirseroussi.com%2f2023%2f11%2f29%2fsupporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Supporting volunteer monitoring of marine biodiversity with modern web and data tools on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2023%2f11%2f29%2fsupporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools%2f&title=Supporting%20volunteer%20monitoring%20of%20marine%20biodiversity%20with%20modern%20web%20and%20data%20tools"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Supporting volunteer monitoring of marine biodiversity with modern web and data tools on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2023%2f11%2f29%2fsupporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Supporting volunteer monitoring of marine biodiversity with modern web and data tools on whatsapp" href="https://api.whatsapp.com/send?text=Supporting%20volunteer%20monitoring%20of%20marine%20biodiversity%20with%20modern%20web%20and%20data%20tools%20-%20https%3a%2f%2fyanirseroussi.com%2f2023%2f11%2f29%2fsupporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Supporting volunteer monitoring of marine biodiversity with modern web and data tools on telegram" href="https://telegram.me/share/url?text=Supporting%20volunteer%20monitoring%20of%20marine%20biodiversity%20with%20modern%20web%20and%20data%20tools&amp;url=https%3a%2f%2fyanirseroussi.com%2f2023%2f11%2f29%2fsupporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Supporting volunteer monitoring of marine biodiversity with modern web and data tools on ycombinator" href="https://news.ycombinator.com/submitlink?t=Supporting%20volunteer%20monitoring%20of%20marine%20biodiversity%20with%20modern%20web%20and%20data%20tools&u=https%3a%2f%2fyanirseroussi.com%2f2023%2f11%2f29%2fsupporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="data engineering,data visualisation,machine learning,marine science,Reef Life Survey,software engineering,web development"><meta name=description content="Summarising the work Uri Seroussi and I did to improve Reef Life Survey&rsquo;s Reef Species of the World app."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Supporting volunteer monitoring of marine biodiversity with modern web and data tools"><meta property="og:description" content="Summarising the work Uri Seroussi and I did to improve Reef Life Survey&rsquo;s Reef Species of the World app."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/"><meta property="og:image" content="https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2023-11-29T02:00:00+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot.webp"><meta name=twitter:title content="Supporting volunteer monitoring of marine biodiversity with modern web and data tools"><meta name=twitter:description content="Summarising the work Uri Seroussi and I did to improve Reef Life Survey&rsquo;s Reef Species of the World app."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Supporting volunteer monitoring of marine biodiversity with modern web and data tools","item":"https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Supporting volunteer monitoring of marine biodiversity with modern web and data tools","name":"Supporting volunteer monitoring of marine biodiversity with modern web and data tools","description":"Summarising the work Uri Seroussi and I did to improve Reef Life Survey\u0026rsquo;s Reef Species of the World app.","keywords":["data engineering","data visualisation","machine learning","marine science","Reef Life Survey","software engineering","web development"],"articleBody":"I’ve been volunteering with the Reef Life Survey (RLS) citizen science project since 2015. RLS volunteers follow the same underwater visual census methodology that has been in use for decades, thereby producing data series that help inform the management of marine ecosystems. In simpler terms, we count fish (and some invertebrates), and this helps various organisations know what’s happening underwater. Among other places, RLS data has been used in scientific publications in Nature and elsewhere, and to inform the management of Australian marine parks.\nOver the years, I created a few online tools to help volunteers with survey work. These included web apps to visualise survey results and study species, as well as infer species from underwater photos. More recently, I agreed to help with the general maintenance of the non-WordPress parts of the RLS website and backend (somewhat reluctantly, but I suppose that’s what happens when you do things out of love).\nTaking greater responsibility to help the tech side of RLS along with an alignment of the research grant stars led to an opportunity to revamp the Reef Species of the World (RSoW) section – a collection of over 5,000 species with in-situ photos, descriptions, and empirical distributions derived from RLS surveys. My focus in this project was on product management, data pipelines, and backend work. I was joined by my brother, Uri Seroussi, who was in charge of front-end development (which became much more substantial than in the original RSoW).\nThe original RSoW was a traditional PHP application that relied on a MySQL database to serve requests, with most of the HTML constructed on the server. By contrast, we re-architected the new RSoW as a progressive web app using Next.js, which has the following advantages and new features:\nFully static site: served faster with reduced server load. Faster search and navigation: happens on the front-end without round-trips to the server. Installable app with offline availability: RSoW can now be installed as a mobile or desktop app, and run without an internet connection. Client-side image classification: offline availability includes image classification in the browser, which is useful when surveying in remote areas. Replacement of previous tools and pipelines: providing a more consistent user experience and improved data reliability. The rest of this post provides details on the architecture and implementation of the new RSoW and its underlying data and machine learning pipelines. But the best way of getting a feel for the data and the tools is to have a play yourself.\nThe new RSoW architecture diagram reflects the compromises between rebuilding and retaining legacy systems The RSoW web app We didn’t start with a blank slate: RSoW was already a public website, with many individual species pages ranking well on web searches (the main source of traffic). As such, a guiding principle was to retain as much of the original functionality as possible, and then build new features on top of it.\nWhen approaching a legacy codebase, there’s always the question of whether rebuilding parts or all of it is a worthwhile endeavour. As Jason Cohen notes, a more apt name for “legacy code” is “revenue code”, i.e., the code that embodies all the original and changed requirements, and has withstood the test of time. Even though RLS’s code isn’t meant to generate revenue, it’s always easy to mess things up when re-implementing existing functionality.\nThe main reasons we decided on a rewrite of the front-end were:\nUser experience: Speed things up, as some species searches were pretty slow due to server round-trips and inefficient database queries. Extensibility: Make it easier to add new features. Offline availability: This is impossible with a traditional PHP back-end, but feasible if all the data and code gets shipped to the client. We chose Next.js as the front-end framework since it’s well-established and supports static exports. Parts of the RLS website run on WordPress, so it’s easy to add statically-generated pages and serve them efficiently via Cloudflare (I wasn’t keen on complicating the stack by adding a Node backend). With static exports, we regenerate all the species pages whenever the data changes, which means that end-user page requests don’t need to touch the database. In addition, the main search page downloads three JSONs with all the data it needs to perform any species search (see sites.json, species.json, and surveys.json in the rls-data repo). Minified and compressed, these JSONs add up to less than 2MB of data, which isn’t tiny, but it is a small price to pay to avoid hitting the database. The JSONs also cache well on Cloudflare, like the rest of the web app’s files.\nFrom a user perspective, replicating the original functionality was the less exciting part of the project. Faster and less buggy code is obviously better, but once feature parity was achieved, we turned our attention to some new features:\nSupporting offline availability and installation by turning RSoW into a progressive web app: On its face, this was supposed to be simple given the next-pwa package, but it turned out to be a bit tricky because the original package was abandoned, and due to multiple layers of caching. It’s well-known that cache invalidation is one of the two hard problems in computer science (along with naming things and off-by-one errors), and progressive web apps offer a lovely variety of caches to deal with – everything needs to be cached on the client for offline availability. We got there after some tinkering and dealing with head-scratching bugs, some of which were caused by other caching layers in addition to the client-side caches (including Cloudflare and some misconfiguration of an early version of the app). Knowledge test: A separate grant came along and Uri had the opportunity to extend RSoW by adding a section that helps test new volunteers ahead of them joining RLS. Species frequency exploration: Bringing in the full functionality from the first tool I built for RLS back in 2017. Client-side image classification: Deprecating the Streamlit app I built a couple of years ago. Data and machine learning pipelines On the back-end, there was an opportunity to simplify things by retiring the original PHP code that processed survey data in favour of the pipelines I implemented in the rls-data repo. Ultimately, survey data comes from the Australian Ocean Data Network (AODN), which holds many more datasets in addition to RLS. Originally, the PHP code that processed survey data into the MySQL database evolved separately from rls-data, which I implemented to generate JSONs for the tools I built. As rls-data is an open source project and the raw survey data is relatively small (\u003c1GB), it made sense to process it with a daily GitHub Action (GHA) script that runs for free. The resultant JSONs are committed to the repo, which means that any unexpected changes are easily tracked (I keep an eye on the commits). It was simple to expand the existing rls-data pipelines to generate all the JSONs needed to serve RSoW, and then say goodbye to the PHP code that implemented similar functionality.\nI’m aware that running data pipelines with GitHub Actions isn’t going to win any awards for sophistication, but it’s a great fit for this project. The key principle is to use the right tool for the job, not the shiniest tool.\nOne part of the original RSoW that we barely touched was the management interface, which allows RLS admins to update species data and upload pictures. The gains from replacing the admin part of RSoW would have been negligible, so it still runs the old PHP code on top of MySQL. Unfortunately, this meant I couldn’t retire all the PHP data pipelines, as species data also comes from the Australian Ocean Data Network and is joined with the edits made by RLS admins. This exemplifies the pragmatism that one often needs to apply when faced with legacy revenue systems: If a system works and there’s no real benefit to replacing it, sticking with the old system is the right thing to do (even if it makes your architecture diagram more complicated).\nI have big plans to improve the machine learning model for inferring RLS species from user images, but it’s somehow never a priority. For RSoW, I did make it a priority to support serving the model with a simple API, but then I decided it’d be worth the effort to export it to ONNX for client-side image classification. This was partly driven by curiosity about ONNX, but it also had two key benefits: (1) support for offline classification; and (2) simplified \u0026 cheaper serving architecture, as ONNX models can be served from S3 and don’t require RLS to pay for server-side compute.\nAs to the machine learning pipelines, they all need to be manually triggered, which is fine since the image data changes slowly. These pipelines are implemented in notebooks and the command-line interface of the ichthywhat repo. I have a bit of a dream of this being an early precursor to complete automation of RLS data collection, with the historical RLS data series continued by divers who would mostly serve as video takers and fish scarers (using cameras without human divers would lead to different biases in the data). However, this is a big project that is probably best left to my next PhD, i.e., it may never happen.\nIn the meantime, I hope to continue diving with RLS, and aim to make pragmatic decisions to keep RSoW running and supporting the community.\n","wordCount":"1574","inLanguage":"en","image":"https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot.webp","datePublished":"2023-11-29T02:00:00Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Supporting volunteer monitoring of marine biodiversity with modern web and data tools</h1><div class=post-meta><span title='2023-11-29 02:00:00 +0000 UTC'>November 29, 2023</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot_huf880d5e70f6cbabeaf9d4b27c6c21664_34088_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot_huf880d5e70f6cbabeaf9d4b27c6c21664_34088_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot_huf880d5e70f6cbabeaf9d4b27c6c21664_34088_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot_huf880d5e70f6cbabeaf9d4b27c6c21664_34088_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/reef-species-of-the-world-screenshot.webp alt="Screenshot of Reef Species of the World" width=1200 height=591></figure><div class=post-content><p>I&rsquo;ve been volunteering with the <a href=https://reeflifesurvey.com/ target=_blank rel=noopener>Reef Life Survey</a> (RLS) citizen science project since 2015. RLS volunteers follow the same underwater visual census methodology that has been in use for decades, thereby producing data series that help inform the management of marine ecosystems. In simpler terms, we count fish (and some invertebrates), and this helps various organisations know what&rsquo;s happening underwater. Among other places, RLS data has been used in scientific publications in <a href=https://www.nature.com/articles/s41586-023-05833-y target=_blank rel=noopener>Nature</a> and <a href=https://reeflifesurvey.com/scientific-papers-management-reports/ target=_blank rel=noopener>elsewhere</a>, and to inform the <a href=https://parksaustralia.gov.au/marine/science/reef-life-survey/ target=_blank rel=noopener>management of Australian marine parks</a>.</p><p>Over the years, I created a few online tools to help volunteers with survey work. These included web apps to <a href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/>visualise survey results and study species</a>, as well as <a href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/>infer species from underwater photos</a>. More recently, I agreed to help with the general maintenance of the non-WordPress parts of the RLS website and backend (<a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>somewhat reluctantly</a>, but I suppose that&rsquo;s what happens when you do things out of love).</p><p>Taking greater responsibility to help the tech side of RLS along with an alignment of the research grant stars led to an opportunity to revamp the <a href=https://reeflifesurvey.com/species/ target=_blank rel=noopener>Reef Species of the World</a> (RSoW) section – a collection of over 5,000 species with in-situ photos, descriptions, and empirical distributions derived from RLS surveys. My focus in this project was on product management, data pipelines, and backend work. I was joined by my brother, <a href=https://www.uriseroussi.com/ target=_blank rel=noopener>Uri Seroussi</a>, who was in charge of front-end development (which became much more substantial than in the original RSoW).</p><p>The original RSoW was a traditional PHP application that relied on a MySQL database to serve requests, with most of the HTML constructed on the server. By contrast, we re-architected the new RSoW as a progressive web app using Next.js, which has the following advantages and new features:</p><ul><li><strong>Fully static site:</strong> served faster with reduced server load.</li><li><strong>Faster search and navigation:</strong> happens on the front-end without round-trips to the server.</li><li><strong>Installable app with offline availability:</strong> RSoW can now be installed as a mobile or desktop app, and run without an internet connection.</li><li><strong>Client-side image classification:</strong> offline availability includes image classification in the browser, which is useful when surveying in remote areas.</li><li><strong>Replacement of previous tools and pipelines:</strong> providing a more consistent user experience and improved data reliability.</li></ul><p>The rest of this post provides details on the architecture and implementation of the new RSoW and its underlying data and machine learning pipelines. But the best way of getting a feel for the data and the tools is <a href=https://reeflifesurvey.com/species/ target=_blank rel=noopener>to have a play yourself</a>.</p><figure><a href=reef-species-of-the-world-architecture.svg target=_blank rel=noopener><img src=reef-species-of-the-world-architecture.svg alt="The new RSoW architecture diagram reflects the compromises between rebuilding and retaining legacy systems" loading=lazy></a><figcaption><p>The new RSoW architecture diagram reflects the compromises between rebuilding and retaining legacy systems</p></figcaption></figure><h2 id=the-rsow-web-app>The RSoW web app<a hidden class=anchor aria-hidden=true href=#the-rsow-web-app>#</a></h2><p>We didn&rsquo;t start with a blank slate: RSoW was already a public website, with many individual species pages ranking well on web searches (the main source of traffic). As such, a guiding principle was to retain as much of the original functionality as possible, and then build new features on top of it.</p><p>When approaching a legacy codebase, there&rsquo;s always the question of whether rebuilding parts or all of it is a worthwhile endeavour. As Jason Cohen notes, <a href=https://longform.asmartbear.com/scale target=_blank rel=noopener>a more apt name for &ldquo;legacy code&rdquo; is &ldquo;revenue code&rdquo;</a>, i.e., the code that embodies all the original and changed requirements, and has withstood the test of time. Even though RLS&rsquo;s code isn&rsquo;t meant to generate revenue, it&rsquo;s always easy to mess things up when re-implementing existing functionality.</p><p>The main reasons we decided on a rewrite of the front-end were:</p><ul><li><strong>User experience:</strong> Speed things up, as some species searches were pretty slow due to server round-trips and inefficient database queries.</li><li><strong>Extensibility:</strong> Make it easier to add new features.</li><li><strong>Offline availability:</strong> This is impossible with a traditional PHP back-end, but feasible if all the data and code gets shipped to the client.</li></ul><p>We chose Next.js as the front-end framework since it&rsquo;s well-established and supports static exports. Parts of the RLS website run on WordPress, so it&rsquo;s easy to add statically-generated pages and serve them efficiently via Cloudflare (I wasn&rsquo;t keen on complicating the stack by adding a Node backend). With static exports, we regenerate all the species pages whenever the data changes, which means that end-user page requests don&rsquo;t need to touch the database. In addition, the main search page downloads three JSONs with all the data it needs to perform any species search (see <code>sites.json</code>, <code>species.json</code>, and <code>surveys.json</code> in <a href=https://github.com/yanirs/rls-data/tree/master/output target=_blank rel=noopener>the <code>rls-data</code> repo</a>). Minified and compressed, these JSONs add up to less than 2MB of data, which isn&rsquo;t tiny, but it is a small price to pay to avoid hitting the database. The JSONs also cache well on Cloudflare, like the rest of the web app&rsquo;s files.</p><p>From a user perspective, replicating the original functionality was the less exciting part of the project. Faster and less buggy code is obviously better, but once feature parity was achieved, we turned our attention to some new features:</p><ul><li><strong>Supporting offline availability and installation</strong> by turning RSoW into a <a href=https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps target=_blank rel=noopener>progressive web app</a>: On its face, this was supposed to be simple given the <code>next-pwa</code> package, but it turned out to be a bit tricky because the original package was abandoned, and due to multiple layers of caching. It&rsquo;s well-known that <a href=https://martinfowler.com/bliki/TwoHardThings.html target=_blank rel=noopener>cache invalidation is one of the two hard problems in computer science</a> (along with naming things and off-by-one errors), and progressive web apps offer a lovely variety of caches to deal with – everything needs to be cached on the client for offline availability. We got there after some tinkering and dealing with head-scratching bugs, some of which were caused by other caching layers in addition to the client-side caches (including Cloudflare and some misconfiguration of an early version of the app).</li><li><strong>Knowledge test</strong>: A separate grant came along and Uri had the opportunity to extend RSoW by adding a section that helps test new volunteers ahead of them joining RLS.</li><li><strong>Species frequency exploration</strong>: Bringing in the full functionality from <a href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/>the first tool I built for RLS back in 2017</a>.</li><li><strong>Client-side image classification</strong>: Deprecating <a href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/>the Streamlit app I built a couple of years ago</a>.</li></ul><h2 id=data-and-machine-learning-pipelines>Data and machine learning pipelines<a hidden class=anchor aria-hidden=true href=#data-and-machine-learning-pipelines>#</a></h2><p>On the back-end, there was an opportunity to simplify things by retiring the original PHP code that processed survey data in favour of the pipelines I implemented in <a href=https://github.com/yanirs/rls-data/ target=_blank rel=noopener>the <code>rls-data</code> repo</a>. Ultimately, survey data comes from the <a href=https://portal.aodn.org.au/ target=_blank rel=noopener>Australian Ocean Data Network</a> (AODN), which holds many more datasets in addition to RLS. Originally, the PHP code that processed survey data into the MySQL database evolved separately from <code>rls-data</code>, which I implemented to generate JSONs for the tools I built. As <code>rls-data</code> is an open source project and the raw survey data is relatively small (&lt;1GB), it made sense to process it with a daily GitHub Action (GHA) script that runs for free. The resultant JSONs are committed to the repo, which means that any unexpected changes are easily tracked (<a href=https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/>I keep an eye on the commits</a>). It was simple to expand the existing <code>rls-data</code> pipelines to generate all the JSONs needed to serve RSoW, and then say goodbye to the PHP code that implemented similar functionality.</p><p>I&rsquo;m aware that running data pipelines with GitHub Actions isn&rsquo;t going to win any awards for sophistication, but it&rsquo;s a great fit for this project. The key principle is to use the right tool for the job, not the shiniest tool.</p><p>One part of the original RSoW that we barely touched was the management interface, which allows RLS admins to update species data and upload pictures. The gains from replacing the admin part of RSoW would have been negligible, so it still runs the old PHP code on top of MySQL. Unfortunately, this meant I couldn&rsquo;t retire all the PHP data pipelines, as species data also comes from the Australian Ocean Data Network and is joined with the edits made by RLS admins. This exemplifies the pragmatism that one often needs to apply when faced with <strike>legacy</strike> revenue systems: If a system works and there&rsquo;s no real benefit to replacing it, sticking with the old system is the right thing to do (even if it makes your architecture diagram more complicated).</p><p>I have <a href=https://github.com/yanirs/ichthywhat/issues/3 target=_blank rel=noopener>big plans</a> to improve the machine learning model for inferring RLS species from user images, but it&rsquo;s somehow never a priority. For RSoW, I did make it a priority to support <a href=https://github.com/yanirs/ichthywhat/pull/11 target=_blank rel=noopener>serving the model with a simple API</a>, but then I decided it&rsquo;d be worth the effort to <a href=https://github.com/yanirs/ichthywhat/pull/20 target=_blank rel=noopener>export it to ONNX for client-side image classification</a>. This was partly driven by curiosity about <a href=https://onnx.ai/ target=_blank rel=noopener>ONNX</a>, but it also had two key benefits: (1) support for offline classification; and (2) simplified & cheaper serving architecture, as ONNX models can be served from S3 and don&rsquo;t require RLS to pay for server-side compute.</p><p>As to the machine learning pipelines, they all need to be manually triggered, which is fine since the image data changes slowly. These pipelines are implemented in <a href=https://github.com/yanirs/ichthywhat target=_blank rel=noopener>notebooks and the command-line interface of the <code>ichthywhat</code> repo</a>. I have a bit of a dream of this being an early precursor to complete automation of RLS data collection, with the historical RLS data series continued by divers who would mostly serve as video takers and fish scarers (using cameras without human divers would lead to different biases in the data). However, this is a big project that is probably best left to my next PhD, i.e., it may never happen.</p><p>In the meantime, I hope to continue diving with RLS, and aim to make pragmatic decisions to keep RSoW running and supporting the community.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-engineering/>Data Engineering</a></li><li><a href=https://yanirseroussi.com/tags/data-visualisation/>Data Visualisation</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/marine-science/>Marine Science</a></li><li><a href=https://yanirseroussi.com/tags/reef-life-survey/>Reef Life Survey</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li><li><a href=https://yanirseroussi.com/tags/web-development/>Web Development</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Supporting volunteer monitoring of marine biodiversity with modern web and data tools on x" href="https://x.com/intent/tweet/?text=Supporting%20volunteer%20monitoring%20of%20marine%20biodiversity%20with%20modern%20web%20and%20data%20tools&amp;url=https%3a%2f%2fyanirseroussi.com%2f2023%2f11%2f29%2fsupporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools%2f&amp;hashtags=dataengineering%2cdatavisualisation%2cmachinelearning%2cmarinescience%2cReefLifeSurvey%2csoftwareengineering%2cwebdevelopment"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Supporting volunteer monitoring of marine biodiversity with modern web and data tools on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2023%2f11%2f29%2fsupporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools%2f&amp;title=Supporting%20volunteer%20monitoring%20of%20marine%20biodiversity%20with%20modern%20web%20and%20data%20tools&amp;summary=Supporting%20volunteer%20monitoring%20of%20marine%20biodiversity%20with%20modern%20web%20and%20data%20tools&amp;source=https%3a%2f%2fyanirseroussi.com%2f2023%2f11%2f29%2fsupporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Supporting volunteer monitoring of marine biodiversity with modern web and data tools on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2023%2f11%2f29%2fsupporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools%2f&title=Supporting%20volunteer%20monitoring%20of%20marine%20biodiversity%20with%20modern%20web%20and%20data%20tools"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Supporting volunteer monitoring of marine biodiversity with modern web and data tools on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2023%2f11%2f29%2fsupporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Supporting volunteer monitoring of marine biodiversity with modern web and data tools on whatsapp" href="https://api.whatsapp.com/send?text=Supporting%20volunteer%20monitoring%20of%20marine%20biodiversity%20with%20modern%20web%20and%20data%20tools%20-%20https%3a%2f%2fyanirseroussi.com%2f2023%2f11%2f29%2fsupporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Supporting volunteer monitoring of marine biodiversity with modern web and data tools on telegram" href="https://telegram.me/share/url?text=Supporting%20volunteer%20monitoring%20of%20marine%20biodiversity%20with%20modern%20web%20and%20data%20tools&amp;url=https%3a%2f%2fyanirseroussi.com%2f2023%2f11%2f29%2fsupporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Supporting volunteer monitoring of marine biodiversity with modern web and data tools on ycombinator" href="https://news.ycombinator.com/submitlink?t=Supporting%20volunteer%20monitoring%20of%20marine%20biodiversity%20with%20modern%20web%20and%20data%20tools&u=https%3a%2f%2fyanirseroussi.com%2f2023%2f11%2f29%2fsupporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/index.html b/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/index.html
index 3defbc6be..2b20d4e4e 100644
--- a/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/index.html
+++ b/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>New decade, new tagline: Data & AI for Impact | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,blogging,data science,environment,personal"><meta name=description content="Shifting focus to &lsquo;Data & AI for Impact&rsquo;, with more startup-related content, increased posting frequency, and deeper audience engagement."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="New decade, new tagline: Data & AI for Impact"><meta property="og:description" content="Shifting focus to &lsquo;Data & AI for Impact&rsquo;, with more startup-related content, increased posting frequency, and deeper audience engagement."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/"><meta property="og:image" content="https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-01-19T00:00:00+00:00"><meta property="article:modified_time" content="2024-01-19T16:35:09+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo.png"><meta name=twitter:title content="New decade, new tagline: Data & AI for Impact"><meta name=twitter:description content="Shifting focus to &lsquo;Data & AI for Impact&rsquo;, with more startup-related content, increased posting frequency, and deeper audience engagement."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"New decade, new tagline: Data \u0026 AI for Impact","item":"https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"New decade, new tagline: Data \u0026 AI for Impact","name":"New decade, new tagline: Data \u0026 AI for Impact","description":"Shifting focus to \u0026lsquo;Data \u0026amp; AI for Impact\u0026rsquo;, with more startup-related content, increased posting frequency, and deeper audience engagement.","keywords":["artificial intelligence","blogging","data science","environment","personal"],"articleBody":"Exactly a decade ago, on 19th January 2014, I published my first post on this website (Kaggle beginner tips). In most of the following years, my tagline was Data Science and Beyond. While the beyond bit gave me an excuse to write about various topics, most posts were indeed around data science – an area that also became broader (arguably to the point of uselessness).\nWhile I’ve never abandoned my software engineering roots, the broadening of data science means that many data scientists can no longer be assumed to possess solid engineering skills. Therefore, I changed the tagline last year to Engineering Data Science \u0026 More. However, this didn’t feel quite right – some people now have an adverse reaction to any mention of data science, after negative experiences of failed projects.\nRecently, I switched the tagline to be both broader and narrower: Data \u0026 AI for Nature. However, upon reflection and given some feedback, I realised that the Nature bit may be off-putting to some people who do impactful work in the space but have different motivations. Therefore, I decided to go with Data \u0026 AI for Impact (for now…).\nMore importantly, I’m planning to revitalise my approach to publishing and audience engagement:\nPost more frequently – aiming for weekly from February onwards. Use the mailing list to email full posts, and as a two-way avenue for comments and conversations (as opposed to public comments, which are now closed). Still publish both technical and high-level posts on Data \u0026 AI. Produce content that’s specifically useful for startups and scaleups that are early on their Data \u0026 AI journey. Showcase positive-impact applications of Data \u0026 AI tech – especially by startups in the climate and nature-positive space. With more frequent posts, what I publish should be quicker to produce and consume. This means I may lean more heavily on showcasing other people’s work – possibly through interviews. Other than that, here are some rough post ideas for the immediate future:\nSeries on a minimum viable data stack Best practices and opinions on a startup’s first data hire Answering questions people ask on the future of data science My experience as a Data Tech Lead with Work on Climate Use cases for ChatGPT and other LLMs Catching up on different aspects of LLMs / AI tech Opportunities for Data \u0026 AI professionals in the energy transition Historically, for each post I’ve published, about 5-10 ideas went unpublished. I hope that by aiming for shorter and lower-friction publishing, more posts will see the light of day.\nMy long-term aims are to learn by publishing, apply my Data \u0026 AI skills towards more positive impact, and help others in the space. Rather than sinking into doom and gloom, I’d like to focus on positive applications of Data \u0026 AI tech that make our world better (in the spirit of publications like Volts).\nCall to action:\nIf this all sounds uninteresting to you, you’re welcome to unsubscribe – no hard feelings. If you know people I should talk to and feature in future posts, I’d appreciate an intro. If you have any suggestions, please send them by replying to any of my emails, or contact me through other means – I’d love to hear from you. ","wordCount":"542","inLanguage":"en","image":"https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo.png","datePublished":"2024-01-19T00:00:00Z","dateModified":"2024-01-19T16:35:09+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">New decade, new tagline: Data & AI for Impact</h1><div class=post-meta><span title='2024-01-19 00:00:00 +0000 UTC'>January 19, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo_hub3544c14ea5e8010aed16c5375199374_583819_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo_hub3544c14ea5e8010aed16c5375199374_583819_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo_hub3544c14ea5e8010aed16c5375199374_583819_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo_hub3544c14ea5e8010aed16c5375199374_583819_1080x0_resize_box_3.png 1080w ,https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo_hub3544c14ea5e8010aed16c5375199374_583819_1500x0_resize_box_3.png 1500w ,https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo.png 2280w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo.png alt="Logo of Yanir Seroussi's consulting services, depicting a wave and an up-and-to-the-right graph." width=2280 height=1140></figure><div class=post-content><p>Exactly a decade ago, on 19th January 2014, I published my first post on this website (<a href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/>Kaggle beginner tips</a>). In most of the following years, my tagline was <em>Data Science and Beyond</em>. While the <em>beyond</em> bit gave me an excuse to write about various topics, <a href=https://yanirseroussi.com/tags/data-science/>most posts were indeed around data science</a> – an area that <a href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/>also became broader</a> (arguably to <a href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/>the point of uselessness</a>).</p><p>While I&rsquo;ve never abandoned my software engineering roots, the broadening of data science means that <a href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/>many data scientists can no longer be assumed to possess solid engineering skills</a>. Therefore, I changed the tagline last year to <em>Engineering Data Science & More</em>. However, this didn&rsquo;t feel quite right – some people now have an adverse reaction to any mention of data science, after negative experiences of failed projects.</p><p>Recently, I switched the tagline to be both broader and narrower: <em>Data & AI for Nature</em>. However, upon reflection and given some feedback, I realised that the <em>Nature</em> bit may be off-putting to some people who do impactful work in the space but have different motivations. Therefore, I decided to go with <em>Data & AI for Impact</em> (for now&mldr;).</p><p><strong>More importantly, I&rsquo;m planning to revitalise my approach to publishing and audience engagement:</strong></p><ul><li>Post more frequently – aiming for weekly from February onwards.</li><li>Use the mailing list to email full posts, and as a two-way avenue for comments and conversations (as opposed to public comments, which are now closed).</li><li>Still publish both technical and high-level posts on Data & AI.</li><li>Produce content that&rsquo;s specifically useful for startups and scaleups that are early on their Data & AI journey.</li><li>Showcase positive-impact applications of Data & AI tech – especially by startups in the climate and nature-positive space.</li></ul><p>With more frequent posts, what I publish should be quicker to produce and consume. This means I may lean more heavily on showcasing other people&rsquo;s work – possibly through interviews. Other than that, here are some rough post ideas for the immediate future:</p><ul><li>Series on a minimum viable data stack</li><li>Best practices and opinions on a startup&rsquo;s first data hire</li><li>Answering questions people ask on the future of data science</li><li>My experience as a Data Tech Lead with Work on Climate</li><li>Use cases for ChatGPT and other LLMs</li><li>Catching up on different aspects of LLMs / AI tech</li><li>Opportunities for Data & AI professionals in the energy transition</li></ul><p>Historically, for each post I&rsquo;ve published, about 5-10 ideas went unpublished. I hope that by aiming for shorter and lower-friction publishing, more posts will see the light of day.</p><p>My long-term aims are to learn by publishing, apply my Data & AI skills towards more positive impact, and help others in the space. Rather than sinking into doom and gloom, I&rsquo;d like to focus on positive applications of Data & AI tech that make our world better (in the spirit of publications like <a href=https://www.volts.wtf/ target=_blank rel=noopener>Volts</a>).</p><p><strong>Call to action:</strong></p><ul><li>If this all sounds uninteresting to you, you&rsquo;re welcome to unsubscribe – no hard feelings.</li><li>If you know people I should talk to and feature in future posts, I&rsquo;d appreciate an intro.</li><li>If you have any suggestions, please send them by replying to any of my emails, or <a href=https://yanirseroussi.com/contact/>contact me through other means</a> – I&rsquo;d love to hear from you.</li></ul></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/blogging/>Blogging</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/environment/>Environment</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share New decade, new tagline: Data & AI for Impact on x" href="https://x.com/intent/tweet/?text=New%20decade%2c%20new%20tagline%3a%20Data%20%26%20AI%20for%20Impact&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f01%2f19%2fnew-decade-new-tagline-data-and-ai-for-impact%2f&amp;hashtags=artificialintelligence%2cblogging%2cdatascience%2cenvironment%2cpersonal"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share New decade, new tagline: Data & AI for Impact on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f01%2f19%2fnew-decade-new-tagline-data-and-ai-for-impact%2f&amp;title=New%20decade%2c%20new%20tagline%3a%20Data%20%26%20AI%20for%20Impact&amp;summary=New%20decade%2c%20new%20tagline%3a%20Data%20%26%20AI%20for%20Impact&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f01%2f19%2fnew-decade-new-tagline-data-and-ai-for-impact%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share New decade, new tagline: Data & AI for Impact on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f01%2f19%2fnew-decade-new-tagline-data-and-ai-for-impact%2f&title=New%20decade%2c%20new%20tagline%3a%20Data%20%26%20AI%20for%20Impact"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share New decade, new tagline: Data & AI for Impact on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f01%2f19%2fnew-decade-new-tagline-data-and-ai-for-impact%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share New decade, new tagline: Data & AI for Impact on whatsapp" href="https://api.whatsapp.com/send?text=New%20decade%2c%20new%20tagline%3a%20Data%20%26%20AI%20for%20Impact%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f01%2f19%2fnew-decade-new-tagline-data-and-ai-for-impact%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share New decade, new tagline: Data & AI for Impact on telegram" href="https://telegram.me/share/url?text=New%20decade%2c%20new%20tagline%3a%20Data%20%26%20AI%20for%20Impact&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f01%2f19%2fnew-decade-new-tagline-data-and-ai-for-impact%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share New decade, new tagline: Data & AI for Impact on ycombinator" href="https://news.ycombinator.com/submitlink?t=New%20decade%2c%20new%20tagline%3a%20Data%20%26%20AI%20for%20Impact&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f01%2f19%2fnew-decade-new-tagline-data-and-ai-for-impact%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="artificial intelligence,blogging,data science,environment,personal"><meta name=description content="Shifting focus to &lsquo;Data & AI for Impact&rsquo;, with more startup-related content, increased posting frequency, and deeper audience engagement."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="New decade, new tagline: Data & AI for Impact"><meta property="og:description" content="Shifting focus to &lsquo;Data & AI for Impact&rsquo;, with more startup-related content, increased posting frequency, and deeper audience engagement."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/"><meta property="og:image" content="https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-01-19T00:00:00+00:00"><meta property="article:modified_time" content="2024-01-19T16:35:09+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo.png"><meta name=twitter:title content="New decade, new tagline: Data & AI for Impact"><meta name=twitter:description content="Shifting focus to &lsquo;Data & AI for Impact&rsquo;, with more startup-related content, increased posting frequency, and deeper audience engagement."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"New decade, new tagline: Data \u0026 AI for Impact","item":"https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"New decade, new tagline: Data \u0026 AI for Impact","name":"New decade, new tagline: Data \u0026 AI for Impact","description":"Shifting focus to \u0026lsquo;Data \u0026amp; AI for Impact\u0026rsquo;, with more startup-related content, increased posting frequency, and deeper audience engagement.","keywords":["artificial intelligence","blogging","data science","environment","personal"],"articleBody":"Exactly a decade ago, on 19th January 2014, I published my first post on this website (Kaggle beginner tips). In most of the following years, my tagline was Data Science and Beyond. While the beyond bit gave me an excuse to write about various topics, most posts were indeed around data science – an area that also became broader (arguably to the point of uselessness).\nWhile I’ve never abandoned my software engineering roots, the broadening of data science means that many data scientists can no longer be assumed to possess solid engineering skills. Therefore, I changed the tagline last year to Engineering Data Science \u0026 More. However, this didn’t feel quite right – some people now have an adverse reaction to any mention of data science, after negative experiences of failed projects.\nRecently, I switched the tagline to be both broader and narrower: Data \u0026 AI for Nature. However, upon reflection and given some feedback, I realised that the Nature bit may be off-putting to some people who do impactful work in the space but have different motivations. Therefore, I decided to go with Data \u0026 AI for Impact (for now…).\nMore importantly, I’m planning to revitalise my approach to publishing and audience engagement:\nPost more frequently – aiming for weekly from February onwards. Use the mailing list to email full posts, and as a two-way avenue for comments and conversations (as opposed to public comments, which are now closed). Still publish both technical and high-level posts on Data \u0026 AI. Produce content that’s specifically useful for startups and scaleups that are early on their Data \u0026 AI journey. Showcase positive-impact applications of Data \u0026 AI tech – especially by startups in the climate and nature-positive space. With more frequent posts, what I publish should be quicker to produce and consume. This means I may lean more heavily on showcasing other people’s work – possibly through interviews. Other than that, here are some rough post ideas for the immediate future:\nSeries on a minimum viable data stack Best practices and opinions on a startup’s first data hire Answering questions people ask on the future of data science My experience as a Data Tech Lead with Work on Climate Use cases for ChatGPT and other LLMs Catching up on different aspects of LLMs / AI tech Opportunities for Data \u0026 AI professionals in the energy transition Historically, for each post I’ve published, about 5-10 ideas went unpublished. I hope that by aiming for shorter and lower-friction publishing, more posts will see the light of day.\nMy long-term aims are to learn by publishing, apply my Data \u0026 AI skills towards more positive impact, and help others in the space. Rather than sinking into doom and gloom, I’d like to focus on positive applications of Data \u0026 AI tech that make our world better (in the spirit of publications like Volts).\nCall to action:\nIf this all sounds uninteresting to you, you’re welcome to unsubscribe – no hard feelings. If you know people I should talk to and feature in future posts, I’d appreciate an intro. If you have any suggestions, please send them by replying to any of my emails, or contact me through other means – I’d love to hear from you. ","wordCount":"542","inLanguage":"en","image":"https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo.png","datePublished":"2024-01-19T00:00:00Z","dateModified":"2024-01-19T16:35:09+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">New decade, new tagline: Data & AI for Impact</h1><div class=post-meta><span title='2024-01-19 00:00:00 +0000 UTC'>January 19, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo_hub3544c14ea5e8010aed16c5375199374_583819_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo_hub3544c14ea5e8010aed16c5375199374_583819_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo_hub3544c14ea5e8010aed16c5375199374_583819_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo_hub3544c14ea5e8010aed16c5375199374_583819_1080x0_resize_box_3.png 1080w ,https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo_hub3544c14ea5e8010aed16c5375199374_583819_1500x0_resize_box_3.png 1500w ,https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo.png 2280w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/data-and-ai-for-impact-logo.png alt="Logo of Yanir Seroussi's consulting services, depicting a wave and an up-and-to-the-right graph." width=2280 height=1140></figure><div class=post-content><p>Exactly a decade ago, on 19th January 2014, I published my first post on this website (<a href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/>Kaggle beginner tips</a>). In most of the following years, my tagline was <em>Data Science and Beyond</em>. While the <em>beyond</em> bit gave me an excuse to write about various topics, <a href=https://yanirseroussi.com/tags/data-science/>most posts were indeed around data science</a> – an area that <a href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/>also became broader</a> (arguably to <a href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/>the point of uselessness</a>).</p><p>While I&rsquo;ve never abandoned my software engineering roots, the broadening of data science means that <a href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/>many data scientists can no longer be assumed to possess solid engineering skills</a>. Therefore, I changed the tagline last year to <em>Engineering Data Science & More</em>. However, this didn&rsquo;t feel quite right – some people now have an adverse reaction to any mention of data science, after negative experiences of failed projects.</p><p>Recently, I switched the tagline to be both broader and narrower: <em>Data & AI for Nature</em>. However, upon reflection and given some feedback, I realised that the <em>Nature</em> bit may be off-putting to some people who do impactful work in the space but have different motivations. Therefore, I decided to go with <em>Data & AI for Impact</em> (for now&mldr;).</p><p><strong>More importantly, I&rsquo;m planning to revitalise my approach to publishing and audience engagement:</strong></p><ul><li>Post more frequently – aiming for weekly from February onwards.</li><li>Use the mailing list to email full posts, and as a two-way avenue for comments and conversations (as opposed to public comments, which are now closed).</li><li>Still publish both technical and high-level posts on Data & AI.</li><li>Produce content that&rsquo;s specifically useful for startups and scaleups that are early on their Data & AI journey.</li><li>Showcase positive-impact applications of Data & AI tech – especially by startups in the climate and nature-positive space.</li></ul><p>With more frequent posts, what I publish should be quicker to produce and consume. This means I may lean more heavily on showcasing other people&rsquo;s work – possibly through interviews. Other than that, here are some rough post ideas for the immediate future:</p><ul><li>Series on a minimum viable data stack</li><li>Best practices and opinions on a startup&rsquo;s first data hire</li><li>Answering questions people ask on the future of data science</li><li>My experience as a Data Tech Lead with Work on Climate</li><li>Use cases for ChatGPT and other LLMs</li><li>Catching up on different aspects of LLMs / AI tech</li><li>Opportunities for Data & AI professionals in the energy transition</li></ul><p>Historically, for each post I&rsquo;ve published, about 5-10 ideas went unpublished. I hope that by aiming for shorter and lower-friction publishing, more posts will see the light of day.</p><p>My long-term aims are to learn by publishing, apply my Data & AI skills towards more positive impact, and help others in the space. Rather than sinking into doom and gloom, I&rsquo;d like to focus on positive applications of Data & AI tech that make our world better (in the spirit of publications like <a href=https://www.volts.wtf/ target=_blank rel=noopener>Volts</a>).</p><p><strong>Call to action:</strong></p><ul><li>If this all sounds uninteresting to you, you&rsquo;re welcome to unsubscribe – no hard feelings.</li><li>If you know people I should talk to and feature in future posts, I&rsquo;d appreciate an intro.</li><li>If you have any suggestions, please send them by replying to any of my emails, or <a href=https://yanirseroussi.com/contact/>contact me through other means</a> – I&rsquo;d love to hear from you.</li></ul></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/blogging/>Blogging</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/environment/>Environment</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share New decade, new tagline: Data & AI for Impact on x" href="https://x.com/intent/tweet/?text=New%20decade%2c%20new%20tagline%3a%20Data%20%26%20AI%20for%20Impact&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f01%2f19%2fnew-decade-new-tagline-data-and-ai-for-impact%2f&amp;hashtags=artificialintelligence%2cblogging%2cdatascience%2cenvironment%2cpersonal"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share New decade, new tagline: Data & AI for Impact on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f01%2f19%2fnew-decade-new-tagline-data-and-ai-for-impact%2f&amp;title=New%20decade%2c%20new%20tagline%3a%20Data%20%26%20AI%20for%20Impact&amp;summary=New%20decade%2c%20new%20tagline%3a%20Data%20%26%20AI%20for%20Impact&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f01%2f19%2fnew-decade-new-tagline-data-and-ai-for-impact%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share New decade, new tagline: Data & AI for Impact on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f01%2f19%2fnew-decade-new-tagline-data-and-ai-for-impact%2f&title=New%20decade%2c%20new%20tagline%3a%20Data%20%26%20AI%20for%20Impact"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share New decade, new tagline: Data & AI for Impact on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f01%2f19%2fnew-decade-new-tagline-data-and-ai-for-impact%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share New decade, new tagline: Data & AI for Impact on whatsapp" href="https://api.whatsapp.com/send?text=New%20decade%2c%20new%20tagline%3a%20Data%20%26%20AI%20for%20Impact%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f01%2f19%2fnew-decade-new-tagline-data-and-ai-for-impact%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share New decade, new tagline: Data & AI for Impact on telegram" href="https://telegram.me/share/url?text=New%20decade%2c%20new%20tagline%3a%20Data%20%26%20AI%20for%20Impact&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f01%2f19%2fnew-decade-new-tagline-data-and-ai-for-impact%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share New decade, new tagline: Data & AI for Impact on ycombinator" href="https://news.ycombinator.com/submitlink?t=New%20decade%2c%20new%20tagline%3a%20Data%20%26%20AI%20for%20Impact&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f01%2f19%2fnew-decade-new-tagline-data-and-ai-for-impact%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/index.html b/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/index.html
index a4603bda0..e2d01b170 100644
--- a/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/index.html
+++ b/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Substance over titles: Your first data hire may be a data scientist | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="analytics,business,career,data engineering,data science,startups"><meta name=description content="Advice for hiring a startup&rsquo;s first data person: match skills to business needs, consider contractors, and get help from data people."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Substance over titles: Your first data hire may be a data scientist"><meta property="og:description" content="Advice for hiring a startup&rsquo;s first data person: match skills to business needs, consider contractors, and get help from data people."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/"><meta property="og:image" content="https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/versatile-data-person.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-02-05T02:45:00+00:00"><meta property="article:modified_time" content="2024-02-19T11:25:54+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/versatile-data-person.webp"><meta name=twitter:title content="Substance over titles: Your first data hire may be a data scientist"><meta name=twitter:description content="Advice for hiring a startup&rsquo;s first data person: match skills to business needs, consider contractors, and get help from data people."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Substance over titles: Your first data hire may be a data scientist","item":"https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Substance over titles: Your first data hire may be a data scientist","name":"Substance over titles: Your first data hire may be a data scientist","description":"Advice for hiring a startup\u0026rsquo;s first data person: match skills to business needs, consider contractors, and get help from data people.","keywords":["analytics","business","career","data engineering","data science","startups"],"articleBody":"If you search the web for ‘first startup data hire’, you may come across some strongly-worded advice claiming that this person must not be a data scientist, or that they must be a data engineer / analyst. In my view, being so prescriptive about titles risks missing out on great candidates. The reality is that titles in the data world are messy and fluid – it’s best to start by getting clear on what the data person is going to do, and proceed from there.\nBeyond titles, this post summarises my perspective on questions that arise around the first data hire, and presents some pointers to help you hire successfully.\nAssumptions and Timing Key assumption: If your startup needs a data hire, you’re probably at a stage where you’re starting to be limited by visibility into your data. You are generating revenue, but your data is all over the place (spreadsheets, dashboards of various tools, unstructured logs, etc.). There are important questions about your business that you can’t answer because you’re either not collecting the data, or because it’s too hard to gather it into a coherent story. No one on your technical team has informed opinions on tools to pick out of the dozens of options for warehousing / ingestion / transformation / analytics / orchestration.\nIf this is the case, then data and machine learning isn’t core to your product. You need someone to set up your data pipelines and analytics. These will primarily serve internal-facing use cases, like driving marketing decisions. However, the first question to ask is: Do you really need to hire someone for a permanent full-time position?\nPersonally, I’m biased in favour of not hiring (yet): You can get started on your data journey with a contractor or a part-time (aka fractional) person. This should give you a better understanding of your data needs, and get you to a better place in terms of data infrastructure and dashboards. This person may also want to become a full-timer down the track, or help you with hiring other data people.\nRemember that – by definition – premature hiring unnecessarily shortens your runway. Hiring and onboarding a full-timer would usually take longer than bringing on an experienced contractor. And if you need to let them go, it may adversely affect team morale. This doesn’t apply to contractors, who are expected to leave when their contract is over.\nThat said, there is value in retaining a long-term owner of your data and analytics. Every business, dataset, and data stack have their quirks, so the familiarity that comes with long-term ownership is a definite point in favour of hiring for a permanent role. That said, you should still be open to part-time if you don’t have full-time needs yet.\nTitles and Skills If you do decide to hire for a permanent role, there are three other articles worth reading:\nAndrew Bartholomew covers assumptions (similar to the above), responsibilities, skills, management, and the thorny question of titles. He says that the person’s title is “the least important question […] you’re hiring a Senior Analytics Engineer or a Senior Data Analyst, but in practice this person might prefer a Senior Data Scientist title, or Analytics Lead, or something else.” I agree with this and pretty much everything else in Andrew’s article, though it is important to align on the expectations implied by titles (more on this below). Colleen Tartow advocates for hiring a senior data engineer. While Colleen’s advice is sensible, I’d be careful with following it blindly due to the messiness of titles and experiences. For example, you probably don’t want a data engineer who’s only worked with big companies, as there’s a risk that they’d over-engineer your data stack (initially, you’re aiming for a minimum viable data stack). Also, if they’ve only ever worn the data engineer hat, they may find it hard to uncover and communicate the insights you’re after. Sebastian Hewing goes deep into the question of timing the hire as a function of product-market fit. I agree with most points, but disagree with this phrasing: “The last person you want is a Data Scientist. […] What you need, in my opinion, is a Head of Data \u0026 Analytics.” I believe that someone who has full-stack data science experience may make a great Head of Data \u0026 Analytics – it all comes down to skills and experiences rather than past titles, which can only ever tell a part of the story. That said, Sebastian does list a bunch of other data titles that the startup shouldn’t hire, so we probably agree on the essence of the role and the person. I especially like Sebastian’s emphasis on seeking a hands-on data person who can turn data into insights AND insights into action. As you can see, the three articles disagree on the question of titles, with Andrew’s being the most pragmatic. If you want to get even more confused, ask ChatGPT to summarise the collective wisdom of the internet: When I asked it “what should a startup’s first data hire be?”, ChatGPT suggested seven(!) roles with an “it depends” reason for each one. Personally, I’d go for a senior data generalist with an engineering background, who is also attentive to the business side. It’s highly doubtful you’d find someone who goes by this title, so you’ll need to figure out how to find and attract them. This is hard if you’re not familiar with the data space. It’s worth seeking help from data folks in your network, or starting with a contractor to bootstrap the process.\nSummary Putting it all together, once you’ve read the above articles, my opinion is that you should:\nGet clear on the business needs that’d be addressed by a data person. Err on the side of not hiring prematurely – consider a contractor or rely on your current employees. When you’re ready to hire, sketch out a high-level plan for the person’s first 90-180-360 days. Run the plan and job description by some data people you trust. Possible title for the job ad: Data \u0026 Analytics Lead or Head of Data \u0026 Analytics (but you want a hands-on person, so make it clear that this is an individual contributor role initially). Make the plan a part of the job ad – it helps with aligning expectations. Ideally, get data people you trust to help you with the hiring process. Screen out specialists early, regardless of past titles and pedigree. Make expectations as clear as possible during the hiring process – especially if the person hasn’t worked with a startup before. Hire someone who’s a great fit who would help take your business to the next level. Any thoughts or suggestions? Please contact me – I will make edits to this post based on feedback.\n","wordCount":"1131","inLanguage":"en","image":"https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/versatile-data-person.webp","datePublished":"2024-02-05T02:45:00Z","dateModified":"2024-02-19T11:25:54+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Substance over titles: Your first data hire may be a data scientist</h1><div class=post-meta><span title='2024-02-05 02:45:00 +0000 UTC'>February 5, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/versatile-data-person_hu80e76285650442be211f4d770c7e8090_62104_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/versatile-data-person_hu80e76285650442be211f4d770c7e8090_62104_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/versatile-data-person_hu80e76285650442be211f4d770c7e8090_62104_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/versatile-data-person.webp 1024w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/versatile-data-person.webp alt="ChatGPT's version of a versatile data person" width=1024 height=540></figure><div class=post-content><p>If you search the web for <em>&lsquo;first startup data hire&rsquo;</em>, you may come across some strongly-worded advice claiming that this person <em>must</em> not be a data scientist, or that they <em>must</em> be a data engineer / analyst. In my view, being so prescriptive about titles risks missing out on great candidates. The reality is that titles in the data world are messy and fluid – <strong>it&rsquo;s best to start by getting clear on what the data person is going to do, and proceed from there.</strong></p><p>Beyond titles, this post summarises my perspective on questions that arise around the first data hire, and presents some pointers to help you hire successfully.</p><h2 id=assumptions-and-timing>Assumptions and Timing<a hidden class=anchor aria-hidden=true href=#assumptions-and-timing>#</a></h2><p><strong>Key assumption:</strong> If your startup needs a data hire, you&rsquo;re probably at a stage where you&rsquo;re starting to be limited by visibility into your data. You are generating revenue, but your data is all over the place (spreadsheets, dashboards of various tools, unstructured logs, etc.). There are important questions about your business that you can&rsquo;t answer because you&rsquo;re either not collecting the data, or because it&rsquo;s too hard to gather it into a coherent story. No one on your technical team has informed opinions on tools to pick out of the dozens of options for warehousing / ingestion / transformation / analytics / orchestration.</p><p>If this is the case, then <strong>data and machine learning isn&rsquo;t core to your product</strong>. You need someone to set up your data pipelines and analytics. These will primarily serve internal-facing use cases, like driving marketing decisions. However, <strong>the first question to ask is: Do you really need to hire someone for a permanent full-time position?</strong></p><p>Personally, <strong>I&rsquo;m biased in favour of not hiring (yet)</strong>: You can get started on your data journey with a contractor or a part-time (aka fractional) person. This should give you a better understanding of your data needs, and get you to a better place in terms of data infrastructure and dashboards. This person may also want to become a full-timer down the track, or help you with hiring other data people.</p><p><strong>Remember that – by definition – premature hiring unnecessarily shortens your runway.</strong> Hiring and onboarding a full-timer would usually take longer than bringing on an experienced contractor. And if you need to let them go, it may adversely affect team morale. This doesn&rsquo;t apply to contractors, who are expected to leave when their contract is over.</p><p>That said, <strong>there is value in retaining a long-term owner of your data and analytics.</strong> Every business, dataset, and data stack have their quirks, so the familiarity that comes with long-term ownership is a definite point in favour of hiring for a permanent role. That said, you should still be open to part-time if you don&rsquo;t have full-time needs yet.</p><h2 id=titles-and-skills>Titles and Skills<a hidden class=anchor aria-hidden=true href=#titles-and-skills>#</a></h2><p>If you do decide to hire for a permanent role, there are three other articles worth reading:</p><ul><li><a href=https://www.abartholomew.com/writing/your-first-data-hire target=_blank rel=noopener>Andrew Bartholomew</a> covers assumptions (similar to the above), responsibilities, skills, management, and the thorny question of <em>titles</em>. He says that the person&rsquo;s title is <em>&ldquo;the least important question [&mldr;] you&rsquo;re hiring a Senior Analytics Engineer or a Senior Data Analyst, but in practice this person might prefer a Senior Data Scientist title, or Analytics Lead, or something else.&rdquo;</em> I agree with this and pretty much everything else in Andrew&rsquo;s article, though it is important to align on the expectations implied by titles (more on this below).</li><li><a href=https://thesequel.substack.com/p/your-first-data-hire target=_blank rel=noopener>Colleen Tartow</a> advocates for hiring a senior data engineer. While Colleen&rsquo;s advice is sensible, I&rsquo;d be careful with following it blindly due to the messiness of titles and experiences. For example, you probably don&rsquo;t want a data engineer who&rsquo;s only worked with big companies, as there&rsquo;s a risk that they&rsquo;d over-engineer your data stack (initially, you&rsquo;re aiming for a <a href=https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/>minimum viable data stack</a>). Also, if they&rsquo;ve only ever worn the data engineer hat, they may find it hard to uncover and communicate the insights you&rsquo;re after.</li><li><a href=https://www.linkedin.com/pulse/when-how-hire-your-startups-first-data-person-sebastian-hewing/ target=_blank rel=noopener>Sebastian Hewing</a> goes deep into the question of timing the hire as a function of product-market fit. I agree with most points, but disagree with this phrasing: <em>&ldquo;The last person you want is a Data Scientist. [&mldr;] What you need, in my opinion, is a Head of Data & Analytics.&rdquo;</em> I believe that someone who has <em>full-stack</em> data science experience may make a great Head of Data & Analytics – it all comes down to skills and experiences rather than past titles, which can only ever tell a part of the story. That said, Sebastian does list a bunch of other data titles that the startup <em>shouldn&rsquo;t</em> hire, so we probably agree on the essence of the role and the person. I especially like Sebastian&rsquo;s emphasis on seeking a hands-on data person who can <em>turn data into insights</em> AND <em>insights into action</em>.</li></ul><p>As you can see, the three articles disagree on the question of titles, with Andrew&rsquo;s being the most pragmatic. If you want to get even more confused, ask ChatGPT to summarise the collective wisdom of the internet: When I asked it <em>&ldquo;what should a startup&rsquo;s first data hire be?&rdquo;</em>, ChatGPT suggested seven(!) roles with an &ldquo;it depends&rdquo; reason for each one. Personally, <strong>I&rsquo;d go for a senior data generalist with an engineering background, who is also attentive to the business side</strong>. It&rsquo;s highly doubtful you&rsquo;d find someone who goes by this title, so you&rsquo;ll need to figure out how to find and attract them. This is hard if you&rsquo;re not familiar with the data space. It&rsquo;s worth seeking help from data folks in your network, or starting with a contractor to bootstrap the process.</p><h2 id=summary>Summary<a hidden class=anchor aria-hidden=true href=#summary>#</a></h2><p>Putting it all together, once you&rsquo;ve read the above articles, my opinion is that you should:</p><ol><li>Get clear on the business needs that&rsquo;d be addressed by a data person.</li><li>Err on the side of not hiring prematurely – consider a contractor or rely on your current employees.</li><li>When you&rsquo;re ready to hire, sketch out a high-level plan for the person&rsquo;s first 90-180-360 days.</li><li>Run the plan and job description by some data people you trust.<ul><li>Possible title for the job ad: <em>Data & Analytics Lead</em> or <em>Head of Data & Analytics</em> (but you want a hands-on person, so make it clear that this is an individual contributor role initially).</li><li>Make the plan a part of the job ad – it helps with aligning expectations.</li></ul></li><li>Ideally, get data people you trust to help you with the hiring process.</li><li>Screen out specialists early, regardless of past titles and pedigree.</li><li>Make expectations as clear as possible during the hiring process – especially if the person hasn&rsquo;t worked with a startup before.</li><li>Hire someone who&rsquo;s a great fit who would help take your business to the next level.</li></ol><p>Any thoughts or suggestions? Please <a href=https://yanirseroussi.com/contact/>contact me</a> – I will make edits to this post based on feedback.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/analytics/>Analytics</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-engineering/>Data Engineering</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Substance over titles: Your first data hire may be a data scientist on x" href="https://x.com/intent/tweet/?text=Substance%20over%20titles%3a%20Your%20first%20data%20hire%20may%20be%20a%20data%20scientist&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f05%2fsubstance-over-titles-your-first-data-hire-may-be-a-data-scientist%2f&amp;hashtags=analytics%2cbusiness%2ccareer%2cdataengineering%2cdatascience%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Substance over titles: Your first data hire may be a data scientist on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f05%2fsubstance-over-titles-your-first-data-hire-may-be-a-data-scientist%2f&amp;title=Substance%20over%20titles%3a%20Your%20first%20data%20hire%20may%20be%20a%20data%20scientist&amp;summary=Substance%20over%20titles%3a%20Your%20first%20data%20hire%20may%20be%20a%20data%20scientist&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f05%2fsubstance-over-titles-your-first-data-hire-may-be-a-data-scientist%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Substance over titles: Your first data hire may be a data scientist on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f05%2fsubstance-over-titles-your-first-data-hire-may-be-a-data-scientist%2f&title=Substance%20over%20titles%3a%20Your%20first%20data%20hire%20may%20be%20a%20data%20scientist"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Substance over titles: Your first data hire may be a data scientist on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f05%2fsubstance-over-titles-your-first-data-hire-may-be-a-data-scientist%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Substance over titles: Your first data hire may be a data scientist on whatsapp" href="https://api.whatsapp.com/send?text=Substance%20over%20titles%3a%20Your%20first%20data%20hire%20may%20be%20a%20data%20scientist%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f05%2fsubstance-over-titles-your-first-data-hire-may-be-a-data-scientist%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Substance over titles: Your first data hire may be a data scientist on telegram" href="https://telegram.me/share/url?text=Substance%20over%20titles%3a%20Your%20first%20data%20hire%20may%20be%20a%20data%20scientist&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f05%2fsubstance-over-titles-your-first-data-hire-may-be-a-data-scientist%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Substance over titles: Your first data hire may be a data scientist on ycombinator" href="https://news.ycombinator.com/submitlink?t=Substance%20over%20titles%3a%20Your%20first%20data%20hire%20may%20be%20a%20data%20scientist&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f05%2fsubstance-over-titles-your-first-data-hire-may-be-a-data-scientist%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="analytics,business,career,data engineering,data science,startups"><meta name=description content="Advice for hiring a startup&rsquo;s first data person: match skills to business needs, consider contractors, and get help from data people."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Substance over titles: Your first data hire may be a data scientist"><meta property="og:description" content="Advice for hiring a startup&rsquo;s first data person: match skills to business needs, consider contractors, and get help from data people."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/"><meta property="og:image" content="https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/versatile-data-person.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-02-05T02:45:00+00:00"><meta property="article:modified_time" content="2024-02-19T11:25:54+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/versatile-data-person.webp"><meta name=twitter:title content="Substance over titles: Your first data hire may be a data scientist"><meta name=twitter:description content="Advice for hiring a startup&rsquo;s first data person: match skills to business needs, consider contractors, and get help from data people."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Substance over titles: Your first data hire may be a data scientist","item":"https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Substance over titles: Your first data hire may be a data scientist","name":"Substance over titles: Your first data hire may be a data scientist","description":"Advice for hiring a startup\u0026rsquo;s first data person: match skills to business needs, consider contractors, and get help from data people.","keywords":["analytics","business","career","data engineering","data science","startups"],"articleBody":"If you search the web for ‘first startup data hire’, you may come across some strongly-worded advice claiming that this person must not be a data scientist, or that they must be a data engineer / analyst. In my view, being so prescriptive about titles risks missing out on great candidates. The reality is that titles in the data world are messy and fluid – it’s best to start by getting clear on what the data person is going to do, and proceed from there.\nBeyond titles, this post summarises my perspective on questions that arise around the first data hire, and presents some pointers to help you hire successfully.\nAssumptions and Timing Key assumption: If your startup needs a data hire, you’re probably at a stage where you’re starting to be limited by visibility into your data. You are generating revenue, but your data is all over the place (spreadsheets, dashboards of various tools, unstructured logs, etc.). There are important questions about your business that you can’t answer because you’re either not collecting the data, or because it’s too hard to gather it into a coherent story. No one on your technical team has informed opinions on tools to pick out of the dozens of options for warehousing / ingestion / transformation / analytics / orchestration.\nIf this is the case, then data and machine learning isn’t core to your product. You need someone to set up your data pipelines and analytics. These will primarily serve internal-facing use cases, like driving marketing decisions. However, the first question to ask is: Do you really need to hire someone for a permanent full-time position?\nPersonally, I’m biased in favour of not hiring (yet): You can get started on your data journey with a contractor or a part-time (aka fractional) person. This should give you a better understanding of your data needs, and get you to a better place in terms of data infrastructure and dashboards. This person may also want to become a full-timer down the track, or help you with hiring other data people.\nRemember that – by definition – premature hiring unnecessarily shortens your runway. Hiring and onboarding a full-timer would usually take longer than bringing on an experienced contractor. And if you need to let them go, it may adversely affect team morale. This doesn’t apply to contractors, who are expected to leave when their contract is over.\nThat said, there is value in retaining a long-term owner of your data and analytics. Every business, dataset, and data stack have their quirks, so the familiarity that comes with long-term ownership is a definite point in favour of hiring for a permanent role. That said, you should still be open to part-time if you don’t have full-time needs yet.\nTitles and Skills If you do decide to hire for a permanent role, there are three other articles worth reading:\nAndrew Bartholomew covers assumptions (similar to the above), responsibilities, skills, management, and the thorny question of titles. He says that the person’s title is “the least important question […] you’re hiring a Senior Analytics Engineer or a Senior Data Analyst, but in practice this person might prefer a Senior Data Scientist title, or Analytics Lead, or something else.” I agree with this and pretty much everything else in Andrew’s article, though it is important to align on the expectations implied by titles (more on this below). Colleen Tartow advocates for hiring a senior data engineer. While Colleen’s advice is sensible, I’d be careful with following it blindly due to the messiness of titles and experiences. For example, you probably don’t want a data engineer who’s only worked with big companies, as there’s a risk that they’d over-engineer your data stack (initially, you’re aiming for a minimum viable data stack). Also, if they’ve only ever worn the data engineer hat, they may find it hard to uncover and communicate the insights you’re after. Sebastian Hewing goes deep into the question of timing the hire as a function of product-market fit. I agree with most points, but disagree with this phrasing: “The last person you want is a Data Scientist. […] What you need, in my opinion, is a Head of Data \u0026 Analytics.” I believe that someone who has full-stack data science experience may make a great Head of Data \u0026 Analytics – it all comes down to skills and experiences rather than past titles, which can only ever tell a part of the story. That said, Sebastian does list a bunch of other data titles that the startup shouldn’t hire, so we probably agree on the essence of the role and the person. I especially like Sebastian’s emphasis on seeking a hands-on data person who can turn data into insights AND insights into action. As you can see, the three articles disagree on the question of titles, with Andrew’s being the most pragmatic. If you want to get even more confused, ask ChatGPT to summarise the collective wisdom of the internet: When I asked it “what should a startup’s first data hire be?”, ChatGPT suggested seven(!) roles with an “it depends” reason for each one. Personally, I’d go for a senior data generalist with an engineering background, who is also attentive to the business side. It’s highly doubtful you’d find someone who goes by this title, so you’ll need to figure out how to find and attract them. This is hard if you’re not familiar with the data space. It’s worth seeking help from data folks in your network, or starting with a contractor to bootstrap the process.\nSummary Putting it all together, once you’ve read the above articles, my opinion is that you should:\nGet clear on the business needs that’d be addressed by a data person. Err on the side of not hiring prematurely – consider a contractor or rely on your current employees. When you’re ready to hire, sketch out a high-level plan for the person’s first 90-180-360 days. Run the plan and job description by some data people you trust. Possible title for the job ad: Data \u0026 Analytics Lead or Head of Data \u0026 Analytics (but you want a hands-on person, so make it clear that this is an individual contributor role initially). Make the plan a part of the job ad – it helps with aligning expectations. Ideally, get data people you trust to help you with the hiring process. Screen out specialists early, regardless of past titles and pedigree. Make expectations as clear as possible during the hiring process – especially if the person hasn’t worked with a startup before. Hire someone who’s a great fit who would help take your business to the next level. Any thoughts or suggestions? Please contact me – I will make edits to this post based on feedback.\n","wordCount":"1131","inLanguage":"en","image":"https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/versatile-data-person.webp","datePublished":"2024-02-05T02:45:00Z","dateModified":"2024-02-19T11:25:54+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Substance over titles: Your first data hire may be a data scientist</h1><div class=post-meta><span title='2024-02-05 02:45:00 +0000 UTC'>February 5, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/versatile-data-person_hu80e76285650442be211f4d770c7e8090_62104_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/versatile-data-person_hu80e76285650442be211f4d770c7e8090_62104_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/versatile-data-person_hu80e76285650442be211f4d770c7e8090_62104_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/versatile-data-person.webp 1024w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/versatile-data-person.webp alt="ChatGPT's version of a versatile data person" width=1024 height=540></figure><div class=post-content><p>If you search the web for <em>&lsquo;first startup data hire&rsquo;</em>, you may come across some strongly-worded advice claiming that this person <em>must</em> not be a data scientist, or that they <em>must</em> be a data engineer / analyst. In my view, being so prescriptive about titles risks missing out on great candidates. The reality is that titles in the data world are messy and fluid – <strong>it&rsquo;s best to start by getting clear on what the data person is going to do, and proceed from there.</strong></p><p>Beyond titles, this post summarises my perspective on questions that arise around the first data hire, and presents some pointers to help you hire successfully.</p><h2 id=assumptions-and-timing>Assumptions and Timing<a hidden class=anchor aria-hidden=true href=#assumptions-and-timing>#</a></h2><p><strong>Key assumption:</strong> If your startup needs a data hire, you&rsquo;re probably at a stage where you&rsquo;re starting to be limited by visibility into your data. You are generating revenue, but your data is all over the place (spreadsheets, dashboards of various tools, unstructured logs, etc.). There are important questions about your business that you can&rsquo;t answer because you&rsquo;re either not collecting the data, or because it&rsquo;s too hard to gather it into a coherent story. No one on your technical team has informed opinions on tools to pick out of the dozens of options for warehousing / ingestion / transformation / analytics / orchestration.</p><p>If this is the case, then <strong>data and machine learning isn&rsquo;t core to your product</strong>. You need someone to set up your data pipelines and analytics. These will primarily serve internal-facing use cases, like driving marketing decisions. However, <strong>the first question to ask is: Do you really need to hire someone for a permanent full-time position?</strong></p><p>Personally, <strong>I&rsquo;m biased in favour of not hiring (yet)</strong>: You can get started on your data journey with a contractor or a part-time (aka fractional) person. This should give you a better understanding of your data needs, and get you to a better place in terms of data infrastructure and dashboards. This person may also want to become a full-timer down the track, or help you with hiring other data people.</p><p><strong>Remember that – by definition – premature hiring unnecessarily shortens your runway.</strong> Hiring and onboarding a full-timer would usually take longer than bringing on an experienced contractor. And if you need to let them go, it may adversely affect team morale. This doesn&rsquo;t apply to contractors, who are expected to leave when their contract is over.</p><p>That said, <strong>there is value in retaining a long-term owner of your data and analytics.</strong> Every business, dataset, and data stack have their quirks, so the familiarity that comes with long-term ownership is a definite point in favour of hiring for a permanent role. That said, you should still be open to part-time if you don&rsquo;t have full-time needs yet.</p><h2 id=titles-and-skills>Titles and Skills<a hidden class=anchor aria-hidden=true href=#titles-and-skills>#</a></h2><p>If you do decide to hire for a permanent role, there are three other articles worth reading:</p><ul><li><a href=https://www.abartholomew.com/writing/your-first-data-hire target=_blank rel=noopener>Andrew Bartholomew</a> covers assumptions (similar to the above), responsibilities, skills, management, and the thorny question of <em>titles</em>. He says that the person&rsquo;s title is <em>&ldquo;the least important question [&mldr;] you&rsquo;re hiring a Senior Analytics Engineer or a Senior Data Analyst, but in practice this person might prefer a Senior Data Scientist title, or Analytics Lead, or something else.&rdquo;</em> I agree with this and pretty much everything else in Andrew&rsquo;s article, though it is important to align on the expectations implied by titles (more on this below).</li><li><a href=https://thesequel.substack.com/p/your-first-data-hire target=_blank rel=noopener>Colleen Tartow</a> advocates for hiring a senior data engineer. While Colleen&rsquo;s advice is sensible, I&rsquo;d be careful with following it blindly due to the messiness of titles and experiences. For example, you probably don&rsquo;t want a data engineer who&rsquo;s only worked with big companies, as there&rsquo;s a risk that they&rsquo;d over-engineer your data stack (initially, you&rsquo;re aiming for a <a href=https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/>minimum viable data stack</a>). Also, if they&rsquo;ve only ever worn the data engineer hat, they may find it hard to uncover and communicate the insights you&rsquo;re after.</li><li><a href=https://www.linkedin.com/pulse/when-how-hire-your-startups-first-data-person-sebastian-hewing/ target=_blank rel=noopener>Sebastian Hewing</a> goes deep into the question of timing the hire as a function of product-market fit. I agree with most points, but disagree with this phrasing: <em>&ldquo;The last person you want is a Data Scientist. [&mldr;] What you need, in my opinion, is a Head of Data & Analytics.&rdquo;</em> I believe that someone who has <em>full-stack</em> data science experience may make a great Head of Data & Analytics – it all comes down to skills and experiences rather than past titles, which can only ever tell a part of the story. That said, Sebastian does list a bunch of other data titles that the startup <em>shouldn&rsquo;t</em> hire, so we probably agree on the essence of the role and the person. I especially like Sebastian&rsquo;s emphasis on seeking a hands-on data person who can <em>turn data into insights</em> AND <em>insights into action</em>.</li></ul><p>As you can see, the three articles disagree on the question of titles, with Andrew&rsquo;s being the most pragmatic. If you want to get even more confused, ask ChatGPT to summarise the collective wisdom of the internet: When I asked it <em>&ldquo;what should a startup&rsquo;s first data hire be?&rdquo;</em>, ChatGPT suggested seven(!) roles with an &ldquo;it depends&rdquo; reason for each one. Personally, <strong>I&rsquo;d go for a senior data generalist with an engineering background, who is also attentive to the business side</strong>. It&rsquo;s highly doubtful you&rsquo;d find someone who goes by this title, so you&rsquo;ll need to figure out how to find and attract them. This is hard if you&rsquo;re not familiar with the data space. It&rsquo;s worth seeking help from data folks in your network, or starting with a contractor to bootstrap the process.</p><h2 id=summary>Summary<a hidden class=anchor aria-hidden=true href=#summary>#</a></h2><p>Putting it all together, once you&rsquo;ve read the above articles, my opinion is that you should:</p><ol><li>Get clear on the business needs that&rsquo;d be addressed by a data person.</li><li>Err on the side of not hiring prematurely – consider a contractor or rely on your current employees.</li><li>When you&rsquo;re ready to hire, sketch out a high-level plan for the person&rsquo;s first 90-180-360 days.</li><li>Run the plan and job description by some data people you trust.<ul><li>Possible title for the job ad: <em>Data & Analytics Lead</em> or <em>Head of Data & Analytics</em> (but you want a hands-on person, so make it clear that this is an individual contributor role initially).</li><li>Make the plan a part of the job ad – it helps with aligning expectations.</li></ul></li><li>Ideally, get data people you trust to help you with the hiring process.</li><li>Screen out specialists early, regardless of past titles and pedigree.</li><li>Make expectations as clear as possible during the hiring process – especially if the person hasn&rsquo;t worked with a startup before.</li><li>Hire someone who&rsquo;s a great fit who would help take your business to the next level.</li></ol><p>Any thoughts or suggestions? Please <a href=https://yanirseroussi.com/contact/>contact me</a> – I will make edits to this post based on feedback.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/analytics/>Analytics</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-engineering/>Data Engineering</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Substance over titles: Your first data hire may be a data scientist on x" href="https://x.com/intent/tweet/?text=Substance%20over%20titles%3a%20Your%20first%20data%20hire%20may%20be%20a%20data%20scientist&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f05%2fsubstance-over-titles-your-first-data-hire-may-be-a-data-scientist%2f&amp;hashtags=analytics%2cbusiness%2ccareer%2cdataengineering%2cdatascience%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Substance over titles: Your first data hire may be a data scientist on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f05%2fsubstance-over-titles-your-first-data-hire-may-be-a-data-scientist%2f&amp;title=Substance%20over%20titles%3a%20Your%20first%20data%20hire%20may%20be%20a%20data%20scientist&amp;summary=Substance%20over%20titles%3a%20Your%20first%20data%20hire%20may%20be%20a%20data%20scientist&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f05%2fsubstance-over-titles-your-first-data-hire-may-be-a-data-scientist%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Substance over titles: Your first data hire may be a data scientist on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f05%2fsubstance-over-titles-your-first-data-hire-may-be-a-data-scientist%2f&title=Substance%20over%20titles%3a%20Your%20first%20data%20hire%20may%20be%20a%20data%20scientist"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Substance over titles: Your first data hire may be a data scientist on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f05%2fsubstance-over-titles-your-first-data-hire-may-be-a-data-scientist%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Substance over titles: Your first data hire may be a data scientist on whatsapp" href="https://api.whatsapp.com/send?text=Substance%20over%20titles%3a%20Your%20first%20data%20hire%20may%20be%20a%20data%20scientist%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f05%2fsubstance-over-titles-your-first-data-hire-may-be-a-data-scientist%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Substance over titles: Your first data hire may be a data scientist on telegram" href="https://telegram.me/share/url?text=Substance%20over%20titles%3a%20Your%20first%20data%20hire%20may%20be%20a%20data%20scientist&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f05%2fsubstance-over-titles-your-first-data-hire-may-be-a-data-scientist%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Substance over titles: Your first data hire may be a data scientist on ycombinator" href="https://news.ycombinator.com/submitlink?t=Substance%20over%20titles%3a%20Your%20first%20data%20hire%20may%20be%20a%20data%20scientist&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f05%2fsubstance-over-titles-your-first-data-hire-may-be-a-data-scientist%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/index.html b/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/index.html
index 69b1585b2..2eec00fe7 100644
--- a/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/index.html
+++ b/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Nudging ChatGPT to invent books you have no time to read | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,software engineering"><meta name=description content="Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Nudging ChatGPT to invent books you have no time to read"><meta property="og:description" content="Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/"><meta property="og:image" content="https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-02-12T05:00:00+00:00"><meta property="article:modified_time" content="2024-02-13T08:24:54+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip.webp"><meta name=twitter:title content="Nudging ChatGPT to invent books you have no time to read"><meta name=twitter:description content="Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Nudging ChatGPT to invent books you have no time to read","item":"https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Nudging ChatGPT to invent books you have no time to read","name":"Nudging ChatGPT to invent books you have no time to read","description":"Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities.","keywords":["artificial intelligence","software engineering"],"articleBody":"Large language models like ChatGPT have a wide range of capabilities. One way to discover use cases is to spend time chatting with the bots. Sometimes, by asking them to do things that may seem unnatural, you discover new useful capabilities. Here is a quick story demonstrating this sort of exploration.\nInventing a book summary On a longish drive from Visalia to Los Angeles, I read Chris Riccomini’s thought-provoking post on database disassembly and its implications. At the bottom of the post, there’s a short promotional blurb that says:\nSupport this newsletter by purchasing The Missing README: A Guide for the New Software Engineer for yourself or gifting it to someone.\nWhile my new software engineer days are in the distant past, fresh takes on the basics can be worth reviewing. In addition, my brother Uri Seroussi – who is a new software engineer – was sitting right next to me. So I followed the link, read the overview, skimmed the reviews, and recommended the book to Uri.\nBut… it was a long drive. I figured it’d be fun to get ChatGPT Plus to summarise the book for us – without it having access to the actual book. In response to the initial prompt, it ran a web search, which resulted in a high-level summary that wasn’t very useful (see the full transcript for details). I then asked it to go deeper on the first lesson, to which it replied:\nIt seems that the specific details of the first practical lesson from “The Missing README: A Guide for the New Software Engineer” are not readily available in the search results. The book covers a wide range of essential skills, techniques, and heuristics necessary for effective software engineering, focusing on the realities of the workplace that are not typically taught in school or bootcamps. For a detailed dive into the first practical lesson, it would be best to consult the book directly, as it appears to be structured in a way that sequentially builds on the knowledge and skills crucial for a new software engineer’s success in the industry.\nUseless again…\nI asked it to tell me what it knows without searching the web, to which it replied with a summary that was only loosely connected to the stated contents of the book.\nNot good enough!\nI threw the full summary from Amazon at it, which includes these points:\nYou’ll learn:\nHow to use the legacy code change algorithm, and leave code cleaner than you found it How to write operable code with logging, metrics, configuration, and defensive programming How to write deterministic tests, submit code reviews, and give feedback on other people’s code The technical design process, including experiments, problem definition, documentation, and collaboration What to do when you are on-call, and how to navigate production incidents Architectural techniques that make code change easier Agile development practices like sprint planning, stand-ups, and retrospectives Then I asked it to go deep on the first point, and it delivered a proper summary that seemed legit!\nAs Uri was driving, I got the ChatGPT app to read out each summary. This continued with subsequent points, with me prompting ChatGPT with “OK, next point”, it inventing some stuff that sounded about right, Uri and me discussing the output, and so on.\nShould Uri still read the book? Probably. Reading well-written books helps lessons sink in better than listening to made up summaries. That said, the summaries did provide a good overview of the book topics, and they were educational.\nWhile I probably wouldn’t have had this conversation if it wasn’t for the long drive, I still find this use case interesting. It’s not the first time I got ChatGPT to elaborate on specific allusions – and obtained informative results. The general pattern is giving it some text and asking “what might they mean by X?”\nThe cheatsheet game There was still time on the drive, so I thought it would be fun to get ChatGPT to turn the summaries into a cheatsheet for quick reference. While I could ask for a cheatsheet-like summary and then format it myself, asking for things that aren’t strictly necessary is a good way to learn about ChatGPT’s capabilities.\nI made this request:\nCan you create a pretty cheatsheet summarising the main lessons? Run code if needed to do stuff like putting text in multiple columns so it fits on a single page.\nUnfortunately, the result of the first attempt was horrible – too much text, too little content, and illegible:\nThe second attempt was even worse. It misunderstood my intention when I asked it to cram as much useful info as possible into a single page. This resulted in completely unreadable text on a mostly-blank page:\nGetting it to produce something useful required moving it away from creating an image with the Pillow package. I told it to produce HTML and export it to PDF with code, which put it on the right track.\nFrom that point, it was a matter of telling it to tweak formatting and wording, and we ended up with a usable PDF cheatsheet:\nUnfortunately, the cheatsheet partly lost touch with the book summary, but the bits ChatGPT decided to add aren’t terrible. Also, the drive was over, so it was time to bring the cheatsheet game to an end.\nConclusion While this may seem like a pointless exercise, I’m pleased with these outcomes:\nRelearning that getting ChatGPT to elaborate on summary bullet points can be useful – or at least somewhat entertaining. Coming up with the HTML to PDF export path as a way to get ChatGPT to produce nice-looking PDFs. Producing a lovely cheatsheet that contains some sound advice. Overall, I still find ChatGPT mind-blowing. My usage of it reduced a bit last year, but since I got the Plus subscription, it’s a completely different story. And the really amazing thing is that it’s still early days for this technology. Exciting times!\n","wordCount":"990","inLanguage":"en","image":"https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip.webp","datePublished":"2024-02-12T05:00:00Z","dateModified":"2024-02-13T08:24:54+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Nudging ChatGPT to invent books you have no time to read</h1><div class=post-meta><span title='2024-02-12 05:00:00 +0000 UTC'>February 12, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip_hu32db2efe0d825536568ba7adacbf52f7_85520_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip_hu32db2efe0d825536568ba7adacbf52f7_85520_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip_hu32db2efe0d825536568ba7adacbf52f7_85520_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip_hu32db2efe0d825536568ba7adacbf52f7_85520_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip.webp alt="Cars on the road with a mountain in the background" width=1200 height=630><p>Road trips are more fun with ChatGPT Plus</p></figure><div class=post-content><p>Large language models like ChatGPT have a wide range of capabilities. One way to discover use cases is to <a href=https://www.oneusefulthing.org/i/137082758/inside-the-jagged-frontier target=_blank rel=noopener>spend time chatting with the bots</a>. Sometimes, by asking them to do things that may seem unnatural, you discover new useful capabilities. Here is a quick story demonstrating this sort of exploration.</p><h2 id=inventing-a-book-summary>Inventing a book summary<a hidden class=anchor aria-hidden=true href=#inventing-a-book-summary>#</a></h2><p>On a longish drive from Visalia to Los Angeles, I read <a href=https://materializedview.io/p/databases-are-falling-apart target=_blank rel=noopener>Chris Riccomini&rsquo;s thought-provoking post on database disassembly and its implications</a>. At the bottom of the post, there&rsquo;s a short promotional blurb that says:</p><blockquote><p>Support this newsletter by purchasing <a href=https://www.amazon.com/Missing-README-Guide-Software-Engineer/dp/1718501838 target=_blank rel=noopener>The Missing README: A Guide for the New Software Engineer</a> for yourself or gifting it to someone.</p></blockquote><p>While my new software engineer days are in the distant past, fresh takes on the basics can be worth reviewing. In addition, my brother <a href=https://www.uriseroussi.com/ target=_blank rel=noopener>Uri Seroussi</a> – who is <a href=https://medium.com/@uriser/how-i-became-a-full-stack-engineer-after-10-years-in-life-science-academia-e41287894f3f target=_blank rel=noopener>a new software engineer</a> – was sitting right next to me. So I followed the link, read the overview, skimmed the reviews, and recommended the book to Uri.</p><p>But&mldr; it was a long drive. I figured it&rsquo;d be fun to get ChatGPT Plus to summarise the book for us – without it having access to the actual book. In response to the initial prompt, it ran a web search, which resulted in a high-level summary that wasn&rsquo;t very useful (see <a href=https://chat.openai.com/share/bfe9c803-5cf9-4618-b06a-a6a3e7196c8f target=_blank rel=noopener>the full transcript</a> for details). I then asked it to go deeper on the first lesson, to which it replied:</p><blockquote><p>It seems that the specific details of the first practical lesson from &ldquo;The Missing README: A Guide for the New Software Engineer&rdquo; are not readily available in the search results. The book covers a wide range of essential skills, techniques, and heuristics necessary for effective software engineering, focusing on the realities of the workplace that are not typically taught in school or bootcamps. For a detailed dive into the first practical lesson, it would be best to consult the book directly, as it appears to be structured in a way that sequentially builds on the knowledge and skills crucial for a new software engineer&rsquo;s success in the industry.</p></blockquote><p>Useless again&mldr;</p><p>I asked it to tell me what it knows without searching the web, to which it replied with a summary that was only loosely connected to the stated contents of the book.</p><p>Not good enough!</p><p>I threw the full summary from Amazon at it, which includes these points:</p><blockquote><p>You&rsquo;ll learn:</p><ul><li>How to use the legacy code change algorithm, and leave code cleaner than you found it</li><li>How to write operable code with logging, metrics, configuration, and defensive programming</li><li>How to write deterministic tests, submit code reviews, and give feedback on other people&rsquo;s code</li><li>The technical design process, including experiments, problem definition, documentation, and collaboration</li><li>What to do when you are on-call, and how to navigate production incidents</li><li>Architectural techniques that make code change easier</li><li>Agile development practices like sprint planning, stand-ups, and retrospectives</li></ul></blockquote><p><strong>Then I asked it to go deep on the first point, and it delivered a proper summary that seemed legit!</strong></p><p>As Uri was driving, I got the ChatGPT app to read out each summary. This continued with subsequent points, with me prompting ChatGPT with &ldquo;OK, next point&rdquo;, it inventing some stuff that sounded about right, Uri and me discussing the output, and so on.</p><p>Should Uri still read the book? Probably. Reading well-written books helps lessons sink in better than listening to made up summaries. That said, the summaries did provide a good overview of the book topics, and they were educational.</p><p>While I probably wouldn&rsquo;t have had this conversation if it wasn&rsquo;t for the long drive, I still find this use case interesting. <strong>It&rsquo;s not the first time I got ChatGPT to elaborate on specific allusions – and obtained informative results.</strong> The general pattern is giving it some text and asking <em>&ldquo;what might they mean by X?&rdquo;</em></p><h2 id=the-cheatsheet-game>The cheatsheet game<a hidden class=anchor aria-hidden=true href=#the-cheatsheet-game>#</a></h2><p>There was still time on the drive, so I thought it would be fun to get ChatGPT to turn the summaries into a cheatsheet for quick reference. While I could ask for a cheatsheet-like summary and then format it myself, <strong>asking for things that aren&rsquo;t strictly necessary is a good way to learn about ChatGPT&rsquo;s capabilities.</strong></p><p>I made this request:</p><blockquote><p>Can you create a pretty cheatsheet summarising the main lessons? Run code if needed to do stuff like putting text in multiple columns so it fits on a single page.</p></blockquote><p>Unfortunately, the result of the first attempt was horrible – too much text, too little content, and illegible:</p><style>.post-content figure.cheatsheet img{max-height:500px;border:solid 1px}</style><figure class=cheatsheet><a href=cheatsheet-first-attempt.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="artificial intelligence,software engineering"><meta name=description content="Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Nudging ChatGPT to invent books you have no time to read"><meta property="og:description" content="Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/"><meta property="og:image" content="https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-02-12T05:00:00+00:00"><meta property="article:modified_time" content="2024-02-13T08:24:54+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip.webp"><meta name=twitter:title content="Nudging ChatGPT to invent books you have no time to read"><meta name=twitter:description content="Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Nudging ChatGPT to invent books you have no time to read","item":"https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Nudging ChatGPT to invent books you have no time to read","name":"Nudging ChatGPT to invent books you have no time to read","description":"Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities.","keywords":["artificial intelligence","software engineering"],"articleBody":"Large language models like ChatGPT have a wide range of capabilities. One way to discover use cases is to spend time chatting with the bots. Sometimes, by asking them to do things that may seem unnatural, you discover new useful capabilities. Here is a quick story demonstrating this sort of exploration.\nInventing a book summary On a longish drive from Visalia to Los Angeles, I read Chris Riccomini’s thought-provoking post on database disassembly and its implications. At the bottom of the post, there’s a short promotional blurb that says:\nSupport this newsletter by purchasing The Missing README: A Guide for the New Software Engineer for yourself or gifting it to someone.\nWhile my new software engineer days are in the distant past, fresh takes on the basics can be worth reviewing. In addition, my brother Uri Seroussi – who is a new software engineer – was sitting right next to me. So I followed the link, read the overview, skimmed the reviews, and recommended the book to Uri.\nBut… it was a long drive. I figured it’d be fun to get ChatGPT Plus to summarise the book for us – without it having access to the actual book. In response to the initial prompt, it ran a web search, which resulted in a high-level summary that wasn’t very useful (see the full transcript for details). I then asked it to go deeper on the first lesson, to which it replied:\nIt seems that the specific details of the first practical lesson from “The Missing README: A Guide for the New Software Engineer” are not readily available in the search results. The book covers a wide range of essential skills, techniques, and heuristics necessary for effective software engineering, focusing on the realities of the workplace that are not typically taught in school or bootcamps. For a detailed dive into the first practical lesson, it would be best to consult the book directly, as it appears to be structured in a way that sequentially builds on the knowledge and skills crucial for a new software engineer’s success in the industry.\nUseless again…\nI asked it to tell me what it knows without searching the web, to which it replied with a summary that was only loosely connected to the stated contents of the book.\nNot good enough!\nI threw the full summary from Amazon at it, which includes these points:\nYou’ll learn:\nHow to use the legacy code change algorithm, and leave code cleaner than you found it How to write operable code with logging, metrics, configuration, and defensive programming How to write deterministic tests, submit code reviews, and give feedback on other people’s code The technical design process, including experiments, problem definition, documentation, and collaboration What to do when you are on-call, and how to navigate production incidents Architectural techniques that make code change easier Agile development practices like sprint planning, stand-ups, and retrospectives Then I asked it to go deep on the first point, and it delivered a proper summary that seemed legit!\nAs Uri was driving, I got the ChatGPT app to read out each summary. This continued with subsequent points, with me prompting ChatGPT with “OK, next point”, it inventing some stuff that sounded about right, Uri and me discussing the output, and so on.\nShould Uri still read the book? Probably. Reading well-written books helps lessons sink in better than listening to made up summaries. That said, the summaries did provide a good overview of the book topics, and they were educational.\nWhile I probably wouldn’t have had this conversation if it wasn’t for the long drive, I still find this use case interesting. It’s not the first time I got ChatGPT to elaborate on specific allusions – and obtained informative results. The general pattern is giving it some text and asking “what might they mean by X?”\nThe cheatsheet game There was still time on the drive, so I thought it would be fun to get ChatGPT to turn the summaries into a cheatsheet for quick reference. While I could ask for a cheatsheet-like summary and then format it myself, asking for things that aren’t strictly necessary is a good way to learn about ChatGPT’s capabilities.\nI made this request:\nCan you create a pretty cheatsheet summarising the main lessons? Run code if needed to do stuff like putting text in multiple columns so it fits on a single page.\nUnfortunately, the result of the first attempt was horrible – too much text, too little content, and illegible:\nThe second attempt was even worse. It misunderstood my intention when I asked it to cram as much useful info as possible into a single page. This resulted in completely unreadable text on a mostly-blank page:\nGetting it to produce something useful required moving it away from creating an image with the Pillow package. I told it to produce HTML and export it to PDF with code, which put it on the right track.\nFrom that point, it was a matter of telling it to tweak formatting and wording, and we ended up with a usable PDF cheatsheet:\nUnfortunately, the cheatsheet partly lost touch with the book summary, but the bits ChatGPT decided to add aren’t terrible. Also, the drive was over, so it was time to bring the cheatsheet game to an end.\nConclusion While this may seem like a pointless exercise, I’m pleased with these outcomes:\nRelearning that getting ChatGPT to elaborate on summary bullet points can be useful – or at least somewhat entertaining. Coming up with the HTML to PDF export path as a way to get ChatGPT to produce nice-looking PDFs. Producing a lovely cheatsheet that contains some sound advice. Overall, I still find ChatGPT mind-blowing. My usage of it reduced a bit last year, but since I got the Plus subscription, it’s a completely different story. And the really amazing thing is that it’s still early days for this technology. Exciting times!\n","wordCount":"990","inLanguage":"en","image":"https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip.webp","datePublished":"2024-02-12T05:00:00Z","dateModified":"2024-02-13T08:24:54+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Nudging ChatGPT to invent books you have no time to read</h1><div class=post-meta><span title='2024-02-12 05:00:00 +0000 UTC'>February 12, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip_hu32db2efe0d825536568ba7adacbf52f7_85520_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip_hu32db2efe0d825536568ba7adacbf52f7_85520_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip_hu32db2efe0d825536568ba7adacbf52f7_85520_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip_hu32db2efe0d825536568ba7adacbf52f7_85520_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/chatgpt-plus-road-trip.webp alt="Cars on the road with a mountain in the background" width=1200 height=630><p>Road trips are more fun with ChatGPT Plus</p></figure><div class=post-content><p>Large language models like ChatGPT have a wide range of capabilities. One way to discover use cases is to <a href=https://www.oneusefulthing.org/i/137082758/inside-the-jagged-frontier target=_blank rel=noopener>spend time chatting with the bots</a>. Sometimes, by asking them to do things that may seem unnatural, you discover new useful capabilities. Here is a quick story demonstrating this sort of exploration.</p><h2 id=inventing-a-book-summary>Inventing a book summary<a hidden class=anchor aria-hidden=true href=#inventing-a-book-summary>#</a></h2><p>On a longish drive from Visalia to Los Angeles, I read <a href=https://materializedview.io/p/databases-are-falling-apart target=_blank rel=noopener>Chris Riccomini&rsquo;s thought-provoking post on database disassembly and its implications</a>. At the bottom of the post, there&rsquo;s a short promotional blurb that says:</p><blockquote><p>Support this newsletter by purchasing <a href=https://www.amazon.com/Missing-README-Guide-Software-Engineer/dp/1718501838 target=_blank rel=noopener>The Missing README: A Guide for the New Software Engineer</a> for yourself or gifting it to someone.</p></blockquote><p>While my new software engineer days are in the distant past, fresh takes on the basics can be worth reviewing. In addition, my brother <a href=https://www.uriseroussi.com/ target=_blank rel=noopener>Uri Seroussi</a> – who is <a href=https://medium.com/@uriser/how-i-became-a-full-stack-engineer-after-10-years-in-life-science-academia-e41287894f3f target=_blank rel=noopener>a new software engineer</a> – was sitting right next to me. So I followed the link, read the overview, skimmed the reviews, and recommended the book to Uri.</p><p>But&mldr; it was a long drive. I figured it&rsquo;d be fun to get ChatGPT Plus to summarise the book for us – without it having access to the actual book. In response to the initial prompt, it ran a web search, which resulted in a high-level summary that wasn&rsquo;t very useful (see <a href=https://chat.openai.com/share/bfe9c803-5cf9-4618-b06a-a6a3e7196c8f target=_blank rel=noopener>the full transcript</a> for details). I then asked it to go deeper on the first lesson, to which it replied:</p><blockquote><p>It seems that the specific details of the first practical lesson from &ldquo;The Missing README: A Guide for the New Software Engineer&rdquo; are not readily available in the search results. The book covers a wide range of essential skills, techniques, and heuristics necessary for effective software engineering, focusing on the realities of the workplace that are not typically taught in school or bootcamps. For a detailed dive into the first practical lesson, it would be best to consult the book directly, as it appears to be structured in a way that sequentially builds on the knowledge and skills crucial for a new software engineer&rsquo;s success in the industry.</p></blockquote><p>Useless again&mldr;</p><p>I asked it to tell me what it knows without searching the web, to which it replied with a summary that was only loosely connected to the stated contents of the book.</p><p>Not good enough!</p><p>I threw the full summary from Amazon at it, which includes these points:</p><blockquote><p>You&rsquo;ll learn:</p><ul><li>How to use the legacy code change algorithm, and leave code cleaner than you found it</li><li>How to write operable code with logging, metrics, configuration, and defensive programming</li><li>How to write deterministic tests, submit code reviews, and give feedback on other people&rsquo;s code</li><li>The technical design process, including experiments, problem definition, documentation, and collaboration</li><li>What to do when you are on-call, and how to navigate production incidents</li><li>Architectural techniques that make code change easier</li><li>Agile development practices like sprint planning, stand-ups, and retrospectives</li></ul></blockquote><p><strong>Then I asked it to go deep on the first point, and it delivered a proper summary that seemed legit!</strong></p><p>As Uri was driving, I got the ChatGPT app to read out each summary. This continued with subsequent points, with me prompting ChatGPT with &ldquo;OK, next point&rdquo;, it inventing some stuff that sounded about right, Uri and me discussing the output, and so on.</p><p>Should Uri still read the book? Probably. Reading well-written books helps lessons sink in better than listening to made up summaries. That said, the summaries did provide a good overview of the book topics, and they were educational.</p><p>While I probably wouldn&rsquo;t have had this conversation if it wasn&rsquo;t for the long drive, I still find this use case interesting. <strong>It&rsquo;s not the first time I got ChatGPT to elaborate on specific allusions – and obtained informative results.</strong> The general pattern is giving it some text and asking <em>&ldquo;what might they mean by X?&rdquo;</em></p><h2 id=the-cheatsheet-game>The cheatsheet game<a hidden class=anchor aria-hidden=true href=#the-cheatsheet-game>#</a></h2><p>There was still time on the drive, so I thought it would be fun to get ChatGPT to turn the summaries into a cheatsheet for quick reference. While I could ask for a cheatsheet-like summary and then format it myself, <strong>asking for things that aren&rsquo;t strictly necessary is a good way to learn about ChatGPT&rsquo;s capabilities.</strong></p><p>I made this request:</p><blockquote><p>Can you create a pretty cheatsheet summarising the main lessons? Run code if needed to do stuff like putting text in multiple columns so it fits on a single page.</p></blockquote><p>Unfortunately, the result of the first attempt was horrible – too much text, too little content, and illegible:</p><style>.post-content figure.cheatsheet img{max-height:500px;border:solid 1px}</style><figure class=cheatsheet><a href=cheatsheet-first-attempt.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/cheatsheet-first-attempt_hu532d05f9716a0c0405afd0e2f2433fba_13226_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/cheatsheet-first-attempt_hu532d05f9716a0c0405afd0e2f2433fba_13226_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/cheatsheet-first-attempt_hu532d05f9716a0c0405afd0e2f2433fba_13226_720x0_resize_box_3.png 720w,
diff --git a/2024/02/19/building-your-startups-minimum-viable-data-stack/index.html b/2024/02/19/building-your-startups-minimum-viable-data-stack/index.html
index 5051caf17..afb0bfad2 100644
--- a/2024/02/19/building-your-startups-minimum-viable-data-stack/index.html
+++ b/2024/02/19/building-your-startups-minimum-viable-data-stack/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Building your startup's minimum viable data stack | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data engineering,data strategy,startups"><meta name=description content="First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Building your startup's minimum viable data stack"><meta property="og:description" content="First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/"><meta property="og:image" content="https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-02-19T00:00:00+00:00"><meta property="article:modified_time" content="2024-02-19T11:25:54+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing.webp"><meta name=twitter:title content="Building your startup's minimum viable data stack"><meta name=twitter:description content="First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Building your startup's minimum viable data stack","item":"https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Building your startup's minimum viable data stack","name":"Building your startup\u0027s minimum viable data stack","description":"First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations.","keywords":["data engineering","data strategy","startups"],"articleBody":"In my post on your startup’s first data hire, I noted in passing that the hire’s initial role would be setting up the company’s minimum viable data stack. But what exactly is that?\nConceptually, a minimum viable data stack follows the same principles as a minimum viable product. Breaking it up, it is:\nMinimal: You don’t want to over-build beyond your resources. Instead, set up the simplest stack that satisfies the startup’s near-term needs. Then iterate based on feedback and new requirements. Viable: While this is sometimes forgotten in favour of an over-emphasis on minimality, your data stack has to be viable. That is, it has to satisfy stakeholder needs through every iteration, as shown by the classic drawing above. Data Stack: This is the product that’s getting shipped and built iteratively, consisting of the components listed below. As in my previous post, I’m assuming a startup where the data stack initially serves internal stakeholders. The main difference between a consumer-facing product and an internal-facing data stack is that the latter has users you can easily talk to and fewer unknowns. This makes the task of satisfying user needs easier. This post is the first in a series that will serve as a quick reference for those embarking on the journey of setting up a minimum viable data stack. Future posts will go deeper into each of the key components. However, I can only cover limited ground – readers are encouraged to consult books such as Fundamentals of Data Engineering for a more thorough treatment of the topics covered in the series.\nComponents of a minimum viable data stack As we’re talking about a minimum viable data stack, this list of components isn’t exhaustive. The components I consider to be the bare minimum are:\nStorage: Where the data lives. Ingestion: How the data makes it into storage. Transformation: The layer that joins and changes raw data into more useful form. Analytics: Presentation layer, which can be consumed by non-technical stakeholders. Components that are perhaps conspicuous in their absence are: machine learning / AI (higher on data’s hierarchy of needs – I assume this will come later), data serving / querying (implicitly included in other layers), and orchestration (also implicit – I assume dependencies are initially simple enough so that any orchestration approach would work).\nConsiderations beyond stack components There are at least two critical items to think of early on. I consider a data stack to be nonviable if no thought is given to:\nSecurity. Ignore data security at your own peril. Examples abound of serious data breaches, which are often the result of trivial mistakes or poor design decisions. Starting off by implementing security checklists and following best practices like the principle of least privilege is way easier than trying to enforce them later on. A good technical read on the topic is Building Secure and Reliable Systems, though you’d need to cherry-pick principles to match your needs (it’s a book by Google, and your startup is not Google). Privacy. You should be aware of compliance requirements for the data you store, especially when it comes to private and sensitive data. As with security, it’s much easier to start by complying with privacy requirements than retrofitting a stack once stakeholders have come to depend on data that shouldn’t be collected or retained. In the spirit of minimality, it’s best to err on the side of not storing sensitive data when it isn’t required. This also helps minimise the potential effect of breaches. Other key considerations include:\nData generation. I assume that the business is generating data from multiple sources, which need to be ingested into a single storage system to ultimately drive decisions. If there’s only one source system, it may be too early for a data hire, or for a more sophisticated data stack. When considering data sources, it’s important to keep in mind the three Vs of data: Volume, Velocity, and Variety. Quality assurance. It’s important to have some automated checks in place to avoid breaking pipelines and maintain high data quality before making changes to production systems (e.g., changing transformation or ingestion code). Again, it’s easier to start with high standards for quality than enforce them retrospectively. Low quality is likely to result in low trust of any data or insights, making the stack nonviable. Observability/monitoring/incident response. Inevitably, things will break in production. With good observability and incident response practices, the data team will proactively address such issues – ideally before any stakeholders notice. This goes hand in hand with setting high quality standards – the sort of culture that is easier to set early on than change down the track. Data management. There are many items that fall under data management (e.g., see the list on Wikipedia). Many of them are addressed implicitly or not addressed at the early stages of a data stack. For example, data discovery isn’t a major issue when the data team consists of a single person. Still, it’s worth being aware of management considerations that arise as the data stack matures. Timely automation. As a broad generalisation, engineers like automation. Erring on the side of automation is often a good idea, as it lets computers do what they do best and frees up human time to deal with things that have to be done manually. Done right, automation increases overall quality. However, creating automations takes time, e.g., if a monthly report takes five minutes to generate, and it’d take a day of coding to automate, it’s probably enough to write up the procedure to generate it. You have bigger fish to fry. The need to be boring. Another trap that you can easily fall into is trying shiny new tools. Despite what vendors might say, it’s rare for new tools to be truly transformative. You should strive to be boring in your choice of components. Use proven tools and services for the minimum viable data stack, keeping the shiny experimental stuff to your hobby side projects (I learnt this the hard way). Speed of iteration. Some people believe that high quality always comes at the cost of iteration speed. I disagree, for the same reasons Martin Fowler pointed out in an essay on how increasing the internal quality of software increases iteration speed within weeks of the start of a project. In short, if you don’t invest in quality, you’re committing yourself to spending much of your time firefighting as the complexity of the data stack increases. However, overthinking reversible decisions or spending too much time on non-critical issues is also a real possibility. Above all, you must remember that the goal of the data stack is to support business decisions, which may require some compromises to deliver value as rapidly as possible. Next: Choosing components As noted, I’m aiming for this to be the first in a series of posts on setting up a minimum viable data stack. Each future post will be dedicated to each of the key components, going deeper into currently-available tools for storage, ingestion, transformation, and analytics. The focus will be on tools that are sensible to use by startups.\nStay tuned for future posts! In the meantime, feedback is always welcome.\n","wordCount":"1197","inLanguage":"en","image":"https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing.webp","datePublished":"2024-02-19T00:00:00Z","dateModified":"2024-02-19T11:25:54+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Building your startup's minimum viable data stack</h1><div class=post-meta><span title='2024-02-19 00:00:00 +0000 UTC'>February 19, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing_hu7e6092f63b0a3162cc9fd4ba3406cfc1_57908_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing_hu7e6092f63b0a3162cc9fd4ba3406cfc1_57908_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing_hu7e6092f63b0a3162cc9fd4ba3406cfc1_57908_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing_hu7e6092f63b0a3162cc9fd4ba3406cfc1_57908_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing.webp alt="Henrik Kniberg's drawing of a minimum viable product, showing the wrong way of doing it (non-functional iterations) and the right way of doing it (functional iterations)." width=1200 height=630><p>Minimum viability is about rapidly delivering incremental value. Source: <a href=https://blog.crisp.se/2016/01/25/henrikkniberg/making-sense-of-mvp target=_blank rel=noopener>Henrik Kniberg</a>.</p></figure><div class=post-content><p>In my post on <a href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/>your startup&rsquo;s first data hire</a>, I noted in passing that the hire&rsquo;s initial role would be setting up the company&rsquo;s <em>minimum viable data stack</em>. But what exactly is that?</p><p>Conceptually, a minimum viable data stack follows the same principles as a minimum viable product. Breaking it up, it is:</p><ul><li><strong>Minimal:</strong> You don&rsquo;t want to over-build beyond your resources. Instead, set up the simplest stack that satisfies the startup&rsquo;s near-term needs. Then iterate based on feedback and new requirements.</li><li><strong>Viable:</strong> While this is sometimes forgotten in favour of an over-emphasis on minimality, your data stack <em>has</em> to be viable. That is, it has to satisfy stakeholder needs through every iteration, as shown by the classic drawing above.</li><li><strong>Data Stack:</strong> This is the product that&rsquo;s getting shipped and built iteratively, consisting of the components listed below. As in my previous post, I&rsquo;m assuming a startup where the data stack initially serves internal stakeholders. The main difference between a consumer-facing product and an internal-facing data stack is that the latter has users you can easily talk to and fewer unknowns. This makes the task of satisfying user needs easier.</li></ul><p>This post is the first in a series that will serve as a quick reference for those embarking on the journey of setting up a minimum viable data stack. Future posts will go deeper into each of the key components. However, I can only cover limited ground – readers are encouraged to consult books such as <a href=https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/ target=_blank rel=noopener>Fundamentals of Data Engineering</a> for a more thorough treatment of the topics covered in the series.</p><h2 id=components-of-a-minimum-viable-data-stack>Components of a minimum viable data stack<a hidden class=anchor aria-hidden=true href=#components-of-a-minimum-viable-data-stack>#</a></h2><p>As we&rsquo;re talking about a <em>minimum</em> viable data stack, this list of components isn&rsquo;t exhaustive. The components I consider to be the bare minimum are:</p><ol><li><strong>Storage:</strong> Where the data lives.</li><li><strong>Ingestion:</strong> How the data makes it into storage.</li><li><strong>Transformation:</strong> The layer that joins and changes raw data into more useful form.</li><li><strong>Analytics:</strong> Presentation layer, which can be consumed by non-technical stakeholders.</li></ol><p>Components that are perhaps conspicuous in their absence are: machine learning / AI (higher on <a href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/>data&rsquo;s hierarchy of needs</a> – I assume this will come later), data serving / querying (implicitly included in other layers), and orchestration (also implicit – I assume dependencies are initially simple enough so that any orchestration approach would work).</p><h2 id=considerations-beyond-stack-components>Considerations beyond stack components<a hidden class=anchor aria-hidden=true href=#considerations-beyond-stack-components>#</a></h2><p>There are at least two critical items to think of early on. I consider a data stack to be nonviable if no thought is given to:</p><ul><li><strong>Security.</strong> Ignore data security at your own peril. Examples abound of serious data breaches, which are often the result of trivial mistakes or poor design decisions. Starting off by implementing <a href=https://security-list.js.org/#/README target=_blank rel=noopener>security checklists</a> and following best practices like <a href=https://en.wikipedia.org/wiki/Principle_of_least_privilege target=_blank rel=noopener>the principle of least privilege</a> is <em>way</em> easier than trying to enforce them later on. A good technical read on the topic is <a href=https://www.google.com/books/edition/Building_Secure_and_Reliable_Systems/Kn7UxwEACAAJ target=_blank rel=noopener>Building Secure and Reliable Systems</a>, though you&rsquo;d need to cherry-pick principles to match your needs (it&rsquo;s a book by Google, and your startup is not Google).</li><li><strong>Privacy.</strong> You should be aware of compliance requirements for the data you store, especially when it comes to private and sensitive data. As with security, it&rsquo;s <em>much</em> easier to start by complying with privacy requirements than retrofitting a stack once stakeholders have come to depend on data that shouldn&rsquo;t be collected or retained. In the spirit of minimality, it&rsquo;s best to err on the side of <em>not</em> storing sensitive data when it isn&rsquo;t required. This also helps minimise the potential effect of breaches.</li></ul><p>Other key considerations include:</p><ul><li><strong>Data generation.</strong> I assume that the business is generating data from multiple sources, which need to be ingested into a single storage system to ultimately drive decisions. If there&rsquo;s only one source system, it may be too early for a data hire, or for a more sophisticated data stack. When considering data sources, it&rsquo;s important to keep in mind the three Vs of data: Volume, Velocity, and Variety.</li><li><strong>Quality assurance.</strong> It&rsquo;s important to have some automated checks in place to avoid breaking pipelines and maintain high data quality <em>before</em> making changes to production systems (e.g., changing transformation or ingestion code). Again, it&rsquo;s easier to start with high standards for quality than enforce them retrospectively. Low quality is likely to result in low trust of any data or insights, making the stack nonviable.</li><li><strong>Observability/monitoring/incident response.</strong> Inevitably, things will break in production. With good observability and incident response practices, the data team will proactively address such issues – ideally before any stakeholders notice. This goes hand in hand with setting high quality standards – the sort of culture that is easier to set early on than change down the track.</li><li><strong>Data management.</strong> There are many items that fall under data management (e.g., see <a href=https://en.wikipedia.org/wiki/Data_management target=_blank rel=noopener>the list on Wikipedia</a>). Many of them are addressed implicitly or not addressed at the early stages of a data stack. For example, data discovery isn&rsquo;t a major issue when the data team consists of a single person. Still, it&rsquo;s worth being aware of management considerations that arise as the data stack matures.</li><li><strong>Timely automation.</strong> As a broad generalisation, engineers like automation. Erring on the side of automation is often a good idea, as it lets computers do what they do best and frees up human time to deal with things that have to be done manually. Done right, automation increases overall quality. However, <a href=https://xkcd.com/1205/ target=_blank rel=noopener>creating automations takes time</a>, e.g., if a monthly report takes five minutes to generate, and it&rsquo;d take a day of coding to automate, it&rsquo;s probably enough to write up the procedure to generate it. You have bigger fish to fry.</li><li><strong>The need to be boring.</strong> Another trap that you can easily fall into is trying shiny new tools. Despite what vendors might say, it&rsquo;s rare for new tools to be truly transformative. You should strive to be boring in your choice of components. Use proven tools and services for the minimum viable data stack, keeping the shiny experimental stuff to your hobby side projects (<a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>I learnt this the hard way</a>).</li><li><strong>Speed of iteration.</strong> Some people believe that high quality always comes at the cost of iteration speed. I disagree, for the same reasons Martin Fowler pointed out in <a href=https://martinfowler.com/articles/is-quality-worth-cost.html target=_blank rel=noopener>an essay on how increasing the <em>internal</em> quality of software increases iteration speed within <em>weeks</em> of the start of a project</a>. In short, if you don&rsquo;t invest in quality, you&rsquo;re committing yourself to spending much of your time firefighting as the complexity of the data stack increases. However, overthinking reversible decisions or spending too much time on non-critical issues is also a real possibility. Above all, you must remember that <strong>the goal of the data stack is to support business decisions, which may require some compromises to deliver value as rapidly as possible.</strong></li></ul><h2 id=next-choosing-components>Next: Choosing components<a hidden class=anchor aria-hidden=true href=#next-choosing-components>#</a></h2><p>As noted, I&rsquo;m aiming for this to be the first in a series of posts on setting up a minimum viable data stack. Each future post will be dedicated to each of the key components, going deeper into currently-available tools for storage, ingestion, transformation, and analytics. The focus will be on tools that are sensible to use by startups.</p><p>Stay tuned for future posts! In the meantime, <a href=https://yanirseroussi.com/contact/>feedback is always welcome</a>.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-engineering/>Data Engineering</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Building your startup's minimum viable data stack on x" href="https://x.com/intent/tweet/?text=Building%20your%20startup%27s%20minimum%20viable%20data%20stack&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f19%2fbuilding-your-startups-minimum-viable-data-stack%2f&amp;hashtags=dataengineering%2cdatastrategy%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building your startup's minimum viable data stack on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f19%2fbuilding-your-startups-minimum-viable-data-stack%2f&amp;title=Building%20your%20startup%27s%20minimum%20viable%20data%20stack&amp;summary=Building%20your%20startup%27s%20minimum%20viable%20data%20stack&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f19%2fbuilding-your-startups-minimum-viable-data-stack%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building your startup's minimum viable data stack on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f19%2fbuilding-your-startups-minimum-viable-data-stack%2f&title=Building%20your%20startup%27s%20minimum%20viable%20data%20stack"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building your startup's minimum viable data stack on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f19%2fbuilding-your-startups-minimum-viable-data-stack%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building your startup's minimum viable data stack on whatsapp" href="https://api.whatsapp.com/send?text=Building%20your%20startup%27s%20minimum%20viable%20data%20stack%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f19%2fbuilding-your-startups-minimum-viable-data-stack%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building your startup's minimum viable data stack on telegram" href="https://telegram.me/share/url?text=Building%20your%20startup%27s%20minimum%20viable%20data%20stack&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f19%2fbuilding-your-startups-minimum-viable-data-stack%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building your startup's minimum viable data stack on ycombinator" href="https://news.ycombinator.com/submitlink?t=Building%20your%20startup%27s%20minimum%20viable%20data%20stack&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f19%2fbuilding-your-startups-minimum-viable-data-stack%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="data engineering,data strategy,startups"><meta name=description content="First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Building your startup's minimum viable data stack"><meta property="og:description" content="First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/"><meta property="og:image" content="https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-02-19T00:00:00+00:00"><meta property="article:modified_time" content="2024-02-19T11:25:54+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing.webp"><meta name=twitter:title content="Building your startup's minimum viable data stack"><meta name=twitter:description content="First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Building your startup's minimum viable data stack","item":"https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Building your startup's minimum viable data stack","name":"Building your startup\u0027s minimum viable data stack","description":"First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations.","keywords":["data engineering","data strategy","startups"],"articleBody":"In my post on your startup’s first data hire, I noted in passing that the hire’s initial role would be setting up the company’s minimum viable data stack. But what exactly is that?\nConceptually, a minimum viable data stack follows the same principles as a minimum viable product. Breaking it up, it is:\nMinimal: You don’t want to over-build beyond your resources. Instead, set up the simplest stack that satisfies the startup’s near-term needs. Then iterate based on feedback and new requirements. Viable: While this is sometimes forgotten in favour of an over-emphasis on minimality, your data stack has to be viable. That is, it has to satisfy stakeholder needs through every iteration, as shown by the classic drawing above. Data Stack: This is the product that’s getting shipped and built iteratively, consisting of the components listed below. As in my previous post, I’m assuming a startup where the data stack initially serves internal stakeholders. The main difference between a consumer-facing product and an internal-facing data stack is that the latter has users you can easily talk to and fewer unknowns. This makes the task of satisfying user needs easier. This post is the first in a series that will serve as a quick reference for those embarking on the journey of setting up a minimum viable data stack. Future posts will go deeper into each of the key components. However, I can only cover limited ground – readers are encouraged to consult books such as Fundamentals of Data Engineering for a more thorough treatment of the topics covered in the series.\nComponents of a minimum viable data stack As we’re talking about a minimum viable data stack, this list of components isn’t exhaustive. The components I consider to be the bare minimum are:\nStorage: Where the data lives. Ingestion: How the data makes it into storage. Transformation: The layer that joins and changes raw data into more useful form. Analytics: Presentation layer, which can be consumed by non-technical stakeholders. Components that are perhaps conspicuous in their absence are: machine learning / AI (higher on data’s hierarchy of needs – I assume this will come later), data serving / querying (implicitly included in other layers), and orchestration (also implicit – I assume dependencies are initially simple enough so that any orchestration approach would work).\nConsiderations beyond stack components There are at least two critical items to think of early on. I consider a data stack to be nonviable if no thought is given to:\nSecurity. Ignore data security at your own peril. Examples abound of serious data breaches, which are often the result of trivial mistakes or poor design decisions. Starting off by implementing security checklists and following best practices like the principle of least privilege is way easier than trying to enforce them later on. A good technical read on the topic is Building Secure and Reliable Systems, though you’d need to cherry-pick principles to match your needs (it’s a book by Google, and your startup is not Google). Privacy. You should be aware of compliance requirements for the data you store, especially when it comes to private and sensitive data. As with security, it’s much easier to start by complying with privacy requirements than retrofitting a stack once stakeholders have come to depend on data that shouldn’t be collected or retained. In the spirit of minimality, it’s best to err on the side of not storing sensitive data when it isn’t required. This also helps minimise the potential effect of breaches. Other key considerations include:\nData generation. I assume that the business is generating data from multiple sources, which need to be ingested into a single storage system to ultimately drive decisions. If there’s only one source system, it may be too early for a data hire, or for a more sophisticated data stack. When considering data sources, it’s important to keep in mind the three Vs of data: Volume, Velocity, and Variety. Quality assurance. It’s important to have some automated checks in place to avoid breaking pipelines and maintain high data quality before making changes to production systems (e.g., changing transformation or ingestion code). Again, it’s easier to start with high standards for quality than enforce them retrospectively. Low quality is likely to result in low trust of any data or insights, making the stack nonviable. Observability/monitoring/incident response. Inevitably, things will break in production. With good observability and incident response practices, the data team will proactively address such issues – ideally before any stakeholders notice. This goes hand in hand with setting high quality standards – the sort of culture that is easier to set early on than change down the track. Data management. There are many items that fall under data management (e.g., see the list on Wikipedia). Many of them are addressed implicitly or not addressed at the early stages of a data stack. For example, data discovery isn’t a major issue when the data team consists of a single person. Still, it’s worth being aware of management considerations that arise as the data stack matures. Timely automation. As a broad generalisation, engineers like automation. Erring on the side of automation is often a good idea, as it lets computers do what they do best and frees up human time to deal with things that have to be done manually. Done right, automation increases overall quality. However, creating automations takes time, e.g., if a monthly report takes five minutes to generate, and it’d take a day of coding to automate, it’s probably enough to write up the procedure to generate it. You have bigger fish to fry. The need to be boring. Another trap that you can easily fall into is trying shiny new tools. Despite what vendors might say, it’s rare for new tools to be truly transformative. You should strive to be boring in your choice of components. Use proven tools and services for the minimum viable data stack, keeping the shiny experimental stuff to your hobby side projects (I learnt this the hard way). Speed of iteration. Some people believe that high quality always comes at the cost of iteration speed. I disagree, for the same reasons Martin Fowler pointed out in an essay on how increasing the internal quality of software increases iteration speed within weeks of the start of a project. In short, if you don’t invest in quality, you’re committing yourself to spending much of your time firefighting as the complexity of the data stack increases. However, overthinking reversible decisions or spending too much time on non-critical issues is also a real possibility. Above all, you must remember that the goal of the data stack is to support business decisions, which may require some compromises to deliver value as rapidly as possible. Next: Choosing components As noted, I’m aiming for this to be the first in a series of posts on setting up a minimum viable data stack. Each future post will be dedicated to each of the key components, going deeper into currently-available tools for storage, ingestion, transformation, and analytics. The focus will be on tools that are sensible to use by startups.\nStay tuned for future posts! In the meantime, feedback is always welcome.\n","wordCount":"1197","inLanguage":"en","image":"https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing.webp","datePublished":"2024-02-19T00:00:00Z","dateModified":"2024-02-19T11:25:54+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Building your startup's minimum viable data stack</h1><div class=post-meta><span title='2024-02-19 00:00:00 +0000 UTC'>February 19, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing_hu7e6092f63b0a3162cc9fd4ba3406cfc1_57908_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing_hu7e6092f63b0a3162cc9fd4ba3406cfc1_57908_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing_hu7e6092f63b0a3162cc9fd4ba3406cfc1_57908_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing_hu7e6092f63b0a3162cc9fd4ba3406cfc1_57908_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/henrik-kniberg-minimum-viable-product-drawing.webp alt="Henrik Kniberg's drawing of a minimum viable product, showing the wrong way of doing it (non-functional iterations) and the right way of doing it (functional iterations)." width=1200 height=630><p>Minimum viability is about rapidly delivering incremental value. Source: <a href=https://blog.crisp.se/2016/01/25/henrikkniberg/making-sense-of-mvp target=_blank rel=noopener>Henrik Kniberg</a>.</p></figure><div class=post-content><p>In my post on <a href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/>your startup&rsquo;s first data hire</a>, I noted in passing that the hire&rsquo;s initial role would be setting up the company&rsquo;s <em>minimum viable data stack</em>. But what exactly is that?</p><p>Conceptually, a minimum viable data stack follows the same principles as a minimum viable product. Breaking it up, it is:</p><ul><li><strong>Minimal:</strong> You don&rsquo;t want to over-build beyond your resources. Instead, set up the simplest stack that satisfies the startup&rsquo;s near-term needs. Then iterate based on feedback and new requirements.</li><li><strong>Viable:</strong> While this is sometimes forgotten in favour of an over-emphasis on minimality, your data stack <em>has</em> to be viable. That is, it has to satisfy stakeholder needs through every iteration, as shown by the classic drawing above.</li><li><strong>Data Stack:</strong> This is the product that&rsquo;s getting shipped and built iteratively, consisting of the components listed below. As in my previous post, I&rsquo;m assuming a startup where the data stack initially serves internal stakeholders. The main difference between a consumer-facing product and an internal-facing data stack is that the latter has users you can easily talk to and fewer unknowns. This makes the task of satisfying user needs easier.</li></ul><p>This post is the first in a series that will serve as a quick reference for those embarking on the journey of setting up a minimum viable data stack. Future posts will go deeper into each of the key components. However, I can only cover limited ground – readers are encouraged to consult books such as <a href=https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/ target=_blank rel=noopener>Fundamentals of Data Engineering</a> for a more thorough treatment of the topics covered in the series.</p><h2 id=components-of-a-minimum-viable-data-stack>Components of a minimum viable data stack<a hidden class=anchor aria-hidden=true href=#components-of-a-minimum-viable-data-stack>#</a></h2><p>As we&rsquo;re talking about a <em>minimum</em> viable data stack, this list of components isn&rsquo;t exhaustive. The components I consider to be the bare minimum are:</p><ol><li><strong>Storage:</strong> Where the data lives.</li><li><strong>Ingestion:</strong> How the data makes it into storage.</li><li><strong>Transformation:</strong> The layer that joins and changes raw data into more useful form.</li><li><strong>Analytics:</strong> Presentation layer, which can be consumed by non-technical stakeholders.</li></ol><p>Components that are perhaps conspicuous in their absence are: machine learning / AI (higher on <a href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/>data&rsquo;s hierarchy of needs</a> – I assume this will come later), data serving / querying (implicitly included in other layers), and orchestration (also implicit – I assume dependencies are initially simple enough so that any orchestration approach would work).</p><h2 id=considerations-beyond-stack-components>Considerations beyond stack components<a hidden class=anchor aria-hidden=true href=#considerations-beyond-stack-components>#</a></h2><p>There are at least two critical items to think of early on. I consider a data stack to be nonviable if no thought is given to:</p><ul><li><strong>Security.</strong> Ignore data security at your own peril. Examples abound of serious data breaches, which are often the result of trivial mistakes or poor design decisions. Starting off by implementing <a href=https://security-list.js.org/#/README target=_blank rel=noopener>security checklists</a> and following best practices like <a href=https://en.wikipedia.org/wiki/Principle_of_least_privilege target=_blank rel=noopener>the principle of least privilege</a> is <em>way</em> easier than trying to enforce them later on. A good technical read on the topic is <a href=https://www.google.com/books/edition/Building_Secure_and_Reliable_Systems/Kn7UxwEACAAJ target=_blank rel=noopener>Building Secure and Reliable Systems</a>, though you&rsquo;d need to cherry-pick principles to match your needs (it&rsquo;s a book by Google, and your startup is not Google).</li><li><strong>Privacy.</strong> You should be aware of compliance requirements for the data you store, especially when it comes to private and sensitive data. As with security, it&rsquo;s <em>much</em> easier to start by complying with privacy requirements than retrofitting a stack once stakeholders have come to depend on data that shouldn&rsquo;t be collected or retained. In the spirit of minimality, it&rsquo;s best to err on the side of <em>not</em> storing sensitive data when it isn&rsquo;t required. This also helps minimise the potential effect of breaches.</li></ul><p>Other key considerations include:</p><ul><li><strong>Data generation.</strong> I assume that the business is generating data from multiple sources, which need to be ingested into a single storage system to ultimately drive decisions. If there&rsquo;s only one source system, it may be too early for a data hire, or for a more sophisticated data stack. When considering data sources, it&rsquo;s important to keep in mind the three Vs of data: Volume, Velocity, and Variety.</li><li><strong>Quality assurance.</strong> It&rsquo;s important to have some automated checks in place to avoid breaking pipelines and maintain high data quality <em>before</em> making changes to production systems (e.g., changing transformation or ingestion code). Again, it&rsquo;s easier to start with high standards for quality than enforce them retrospectively. Low quality is likely to result in low trust of any data or insights, making the stack nonviable.</li><li><strong>Observability/monitoring/incident response.</strong> Inevitably, things will break in production. With good observability and incident response practices, the data team will proactively address such issues – ideally before any stakeholders notice. This goes hand in hand with setting high quality standards – the sort of culture that is easier to set early on than change down the track.</li><li><strong>Data management.</strong> There are many items that fall under data management (e.g., see <a href=https://en.wikipedia.org/wiki/Data_management target=_blank rel=noopener>the list on Wikipedia</a>). Many of them are addressed implicitly or not addressed at the early stages of a data stack. For example, data discovery isn&rsquo;t a major issue when the data team consists of a single person. Still, it&rsquo;s worth being aware of management considerations that arise as the data stack matures.</li><li><strong>Timely automation.</strong> As a broad generalisation, engineers like automation. Erring on the side of automation is often a good idea, as it lets computers do what they do best and frees up human time to deal with things that have to be done manually. Done right, automation increases overall quality. However, <a href=https://xkcd.com/1205/ target=_blank rel=noopener>creating automations takes time</a>, e.g., if a monthly report takes five minutes to generate, and it&rsquo;d take a day of coding to automate, it&rsquo;s probably enough to write up the procedure to generate it. You have bigger fish to fry.</li><li><strong>The need to be boring.</strong> Another trap that you can easily fall into is trying shiny new tools. Despite what vendors might say, it&rsquo;s rare for new tools to be truly transformative. You should strive to be boring in your choice of components. Use proven tools and services for the minimum viable data stack, keeping the shiny experimental stuff to your hobby side projects (<a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>I learnt this the hard way</a>).</li><li><strong>Speed of iteration.</strong> Some people believe that high quality always comes at the cost of iteration speed. I disagree, for the same reasons Martin Fowler pointed out in <a href=https://martinfowler.com/articles/is-quality-worth-cost.html target=_blank rel=noopener>an essay on how increasing the <em>internal</em> quality of software increases iteration speed within <em>weeks</em> of the start of a project</a>. In short, if you don&rsquo;t invest in quality, you&rsquo;re committing yourself to spending much of your time firefighting as the complexity of the data stack increases. However, overthinking reversible decisions or spending too much time on non-critical issues is also a real possibility. Above all, you must remember that <strong>the goal of the data stack is to support business decisions, which may require some compromises to deliver value as rapidly as possible.</strong></li></ul><h2 id=next-choosing-components>Next: Choosing components<a hidden class=anchor aria-hidden=true href=#next-choosing-components>#</a></h2><p>As noted, I&rsquo;m aiming for this to be the first in a series of posts on setting up a minimum viable data stack. Each future post will be dedicated to each of the key components, going deeper into currently-available tools for storage, ingestion, transformation, and analytics. The focus will be on tools that are sensible to use by startups.</p><p>Stay tuned for future posts! In the meantime, <a href=https://yanirseroussi.com/contact/>feedback is always welcome</a>.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-engineering/>Data Engineering</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Building your startup's minimum viable data stack on x" href="https://x.com/intent/tweet/?text=Building%20your%20startup%27s%20minimum%20viable%20data%20stack&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f19%2fbuilding-your-startups-minimum-viable-data-stack%2f&amp;hashtags=dataengineering%2cdatastrategy%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building your startup's minimum viable data stack on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f19%2fbuilding-your-startups-minimum-viable-data-stack%2f&amp;title=Building%20your%20startup%27s%20minimum%20viable%20data%20stack&amp;summary=Building%20your%20startup%27s%20minimum%20viable%20data%20stack&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f19%2fbuilding-your-startups-minimum-viable-data-stack%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building your startup's minimum viable data stack on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f19%2fbuilding-your-startups-minimum-viable-data-stack%2f&title=Building%20your%20startup%27s%20minimum%20viable%20data%20stack"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building your startup's minimum viable data stack on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f19%2fbuilding-your-startups-minimum-viable-data-stack%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building your startup's minimum viable data stack on whatsapp" href="https://api.whatsapp.com/send?text=Building%20your%20startup%27s%20minimum%20viable%20data%20stack%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f19%2fbuilding-your-startups-minimum-viable-data-stack%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building your startup's minimum viable data stack on telegram" href="https://telegram.me/share/url?text=Building%20your%20startup%27s%20minimum%20viable%20data%20stack&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f19%2fbuilding-your-startups-minimum-viable-data-stack%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Building your startup's minimum viable data stack on ycombinator" href="https://news.ycombinator.com/submitlink?t=Building%20your%20startup%27s%20minimum%20viable%20data%20stack&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f19%2fbuilding-your-startups-minimum-viable-data-stack%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2024/02/26/avoiding-ai-complexity-first-write-no-code/index.html b/2024/02/26/avoiding-ai-complexity-first-write-no-code/index.html
index f069a044d..c2eab509b 100644
--- a/2024/02/26/avoiding-ai-complexity-first-write-no-code/index.html
+++ b/2024/02/26/avoiding-ai-complexity-first-write-no-code/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Avoiding AI complexity: First, write no code | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,data strategy,machine learning,software engineering,startups"><meta name=description content="Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Avoiding AI complexity: First, write no code"><meta property="og:description" content="Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/"><meta property="og:image" content="https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-02-26T01:45:00+00:00"><meta property="article:modified_time" content="2024-03-04T12:39:10+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere.webp"><meta name=twitter:title content="Avoiding AI complexity: First, write no code"><meta name=twitter:description content="Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Avoiding AI complexity: First, write no code","item":"https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Avoiding AI complexity: First, write no code","name":"Avoiding AI complexity: First, write no code","description":"Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.","keywords":["artificial intelligence","data strategy","machine learning","software engineering","startups"],"articleBody":"Custom software is notoriously hard to build and maintain. Machine learning (ML) adds a layer of complexity on top of traditional software, with many novel ways to accrue technical debt. Therefore, my general advice to young startups considering custom ML development borrows from the first and second rules of optimisation:\nDon’t. Don’t Yet (for experts only). For startups where Data \u0026 AI/ML isn’t a core part of the team’s capabilities, there are usually higher priorities than building custom ML models. However, deriving commercial value from advancements in AI is still possible – even without writing code at all.\nI recently witnessed two stories that exemplify this point.\nExhibit A. Consider this story: A technical lead reached out to me for advice on a computer vision project that wasn’t progressing as expected. For the sake of illustration, let’s say it was a custom model to classify food pictures as hotdog / not hotdog.\nThe company had contracted an ML engineer to drive the project. Despite having no background in ML, the lead felt like the contractor was going down the wrong track, and asked me for my thoughts.\nIt turned out that the contractor believed that the best path forward was trying different model architectures. My advice was to ensure the contractor did the data work first. Often, there are bigger gains to be had from data augmentations than from model tweaks (e.g., applying distortions to the hotdog photos). As the data wasn’t sensitive, I also suggested trying third-party computer vision APIs or GPT-4 Vision to get an idea of what’s possible with the dataset.\nExhibit B. Recently, I caught up with an entrepreneur who comes from a marketing background. Just prior to our meeting, they had successfully pitched an app they had built to a large client.\nRemarkably, the app included a hotdog detector similar to the one the ML engineer was struggling to ship. The entrepreneur used the FlutterFlow no-code platform along with Google’s computer vision APIs to rapidly create an app with commercial value – without deep knowledge of ML.\nExpanding the rules No two companies are exactly alike. Sometimes, custom code or ML models are necessary. However, given the pace of innovation in no-code and low-code software \u0026 AI, starting with the least code possible is often a wise choice. Those who build software as their craft often have a blind spot when it comes to no-code possibilities – coders gonna code. However, it’s important to rein in the coding instinct. The difference in total cost between a custom build and using third-party APIs or no-code solutions can easily be in six or seven figures.\nWhen contemplating custom ML and AI development, consider the following options:\nDon’t build it. Wait until it becomes easier. Use a no-code solution. Get a software engineer to implement it with third-party APIs. Get a software engineer to implement it with third-party models that you self-host (with minimal customisation). Get the experts to build it: ML engineers, data scientists, and data engineers. You should be pretty certain that the cost of Option 6 is worth the investment. One way to get there is by starting with one of Options 3-5, thereby proving (or disproving) that there’s commercial value in the most expensive option. And when the time comes for Option 6, always do the data work!\n","wordCount":"554","inLanguage":"en","image":"https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere.webp","datePublished":"2024-02-26T01:45:00Z","dateModified":"2024-03-04T12:39:10+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Avoiding AI complexity: First, write no code</h1><div class=post-meta><span title='2024-02-26 01:45:00 +0000 UTC'>February 26, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere_huffb26b7c0dd0130024613460683c73d0_226926_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere_huffb26b7c0dd0130024613460683c73d0_226926_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere_huffb26b7c0dd0130024613460683c73d0_226926_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere_huffb26b7c0dd0130024613460683c73d0_226926_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere.webp alt="Illustration showing ancient stone tablets with the inscription 'primum non codere' (inspired by primum non noncere: first, do no harm)" width=1200 height=630></figure><div class=post-content><p>Custom software is notoriously hard to build and maintain. Machine learning (ML) adds a layer of complexity on top of traditional software, with <a href=https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf target=_blank rel=noopener>many novel ways to accrue technical debt</a>. Therefore, my general advice to young startups considering custom ML development borrows from <a href=https://wiki.c2.com/?RulesOfOptimization target=_blank rel=noopener>the first and second rules of optimisation</a>:</p><ol><li>Don&rsquo;t.</li><li>Don&rsquo;t Yet (for experts only).</li></ol><p>For startups where Data & AI/ML isn&rsquo;t a core part of the team&rsquo;s capabilities, there are usually higher priorities than building custom ML models. However, deriving commercial value from advancements in AI is still possible – even without writing code at all.</p><p>I recently witnessed two stories that exemplify this point.</p><p><strong>Exhibit A.</strong> Consider this story: A technical lead reached out to me for advice on a computer vision project that wasn&rsquo;t progressing as expected. For the sake of illustration, let&rsquo;s say it was a custom model to <a href=https://www.theverge.com/tldr/2017/5/14/15639784/hbo-silicon-valley-not-hotdog-app-download target=_blank rel=noopener>classify food pictures as hotdog / not hotdog</a>.</p><p>The company had contracted an ML engineer to drive the project. Despite having no background in ML, the lead felt like the contractor was going down the wrong track, and asked me for my thoughts.</p><p>It turned out that the contractor believed that the best path forward was trying different model architectures. My advice was to ensure the contractor did the data work first. Often, <a href=https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0197-0 target=_blank rel=noopener>there are bigger gains to be had from data augmentations than from model tweaks</a> (e.g., applying distortions to the hotdog photos). As the data wasn&rsquo;t sensitive, I also suggested trying third-party computer vision APIs or GPT-4 Vision to get an idea of what&rsquo;s possible with the dataset.</p><p><strong>Exhibit B.</strong> Recently, I caught up with an entrepreneur who comes from a marketing background. Just prior to our meeting, they had successfully pitched an app they had built to a large client.</p><p>Remarkably, <strong>the app included a hotdog detector similar to the one the ML engineer was struggling to ship.</strong> The entrepreneur used the <a href=https://flutterflow.io/ target=_blank rel=noopener>FlutterFlow</a> no-code platform along with Google&rsquo;s computer vision APIs to rapidly create an app with commercial value – without deep knowledge of ML.</p><h2 id=expanding-the-rules>Expanding the rules<a hidden class=anchor aria-hidden=true href=#expanding-the-rules>#</a></h2><p>No two companies are exactly alike. Sometimes, custom code or ML models are necessary. However, given the pace of innovation in no-code and low-code software & AI, starting with the least code possible is often a wise choice. Those who build software as their craft often have a blind spot when it comes to no-code possibilities – <em>coders gonna code</em>. However, it&rsquo;s important to rein in the coding instinct. The difference in total cost between a custom build and using third-party APIs or no-code solutions can easily be in six or seven figures.</p><p>When contemplating custom ML and AI development, consider the following options:</p><ol><li>Don&rsquo;t build it.</li><li><a href=https://www.oneusefulthing.org/p/the-lazy-tyranny-of-the-wait-calculation target=_blank rel=noopener>Wait until it becomes easier</a>.</li><li>Use a no-code solution.</li><li>Get a software engineer to implement it with third-party APIs.</li><li>Get a software engineer to implement it with third-party models that you self-host (with minimal customisation).</li><li>Get the experts to build it: ML engineers, data scientists, and data engineers.</li></ol><p>You should be pretty certain that the cost of Option 6 is worth the investment. One way to get there is by starting with one of Options 3-5, thereby proving (or disproving) that there&rsquo;s commercial value in the most expensive option. And when the time comes for Option 6, <a href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/>always do the data work</a>!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Avoiding AI complexity: First, write no code on x" href="https://x.com/intent/tweet/?text=Avoiding%20AI%20complexity%3a%20First%2c%20write%20no%20code&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f26%2favoiding-ai-complexity-first-write-no-code%2f&amp;hashtags=artificialintelligence%2cdatastrategy%2cmachinelearning%2csoftwareengineering%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Avoiding AI complexity: First, write no code on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f26%2favoiding-ai-complexity-first-write-no-code%2f&amp;title=Avoiding%20AI%20complexity%3a%20First%2c%20write%20no%20code&amp;summary=Avoiding%20AI%20complexity%3a%20First%2c%20write%20no%20code&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f26%2favoiding-ai-complexity-first-write-no-code%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Avoiding AI complexity: First, write no code on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f26%2favoiding-ai-complexity-first-write-no-code%2f&title=Avoiding%20AI%20complexity%3a%20First%2c%20write%20no%20code"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Avoiding AI complexity: First, write no code on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f26%2favoiding-ai-complexity-first-write-no-code%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Avoiding AI complexity: First, write no code on whatsapp" href="https://api.whatsapp.com/send?text=Avoiding%20AI%20complexity%3a%20First%2c%20write%20no%20code%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f26%2favoiding-ai-complexity-first-write-no-code%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Avoiding AI complexity: First, write no code on telegram" href="https://telegram.me/share/url?text=Avoiding%20AI%20complexity%3a%20First%2c%20write%20no%20code&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f26%2favoiding-ai-complexity-first-write-no-code%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Avoiding AI complexity: First, write no code on ycombinator" href="https://news.ycombinator.com/submitlink?t=Avoiding%20AI%20complexity%3a%20First%2c%20write%20no%20code&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f26%2favoiding-ai-complexity-first-write-no-code%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="artificial intelligence,data strategy,machine learning,software engineering,startups"><meta name=description content="Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Avoiding AI complexity: First, write no code"><meta property="og:description" content="Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/"><meta property="og:image" content="https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-02-26T01:45:00+00:00"><meta property="article:modified_time" content="2024-03-04T12:39:10+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere.webp"><meta name=twitter:title content="Avoiding AI complexity: First, write no code"><meta name=twitter:description content="Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Avoiding AI complexity: First, write no code","item":"https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Avoiding AI complexity: First, write no code","name":"Avoiding AI complexity: First, write no code","description":"Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.","keywords":["artificial intelligence","data strategy","machine learning","software engineering","startups"],"articleBody":"Custom software is notoriously hard to build and maintain. Machine learning (ML) adds a layer of complexity on top of traditional software, with many novel ways to accrue technical debt. Therefore, my general advice to young startups considering custom ML development borrows from the first and second rules of optimisation:\nDon’t. Don’t Yet (for experts only). For startups where Data \u0026 AI/ML isn’t a core part of the team’s capabilities, there are usually higher priorities than building custom ML models. However, deriving commercial value from advancements in AI is still possible – even without writing code at all.\nI recently witnessed two stories that exemplify this point.\nExhibit A. Consider this story: A technical lead reached out to me for advice on a computer vision project that wasn’t progressing as expected. For the sake of illustration, let’s say it was a custom model to classify food pictures as hotdog / not hotdog.\nThe company had contracted an ML engineer to drive the project. Despite having no background in ML, the lead felt like the contractor was going down the wrong track, and asked me for my thoughts.\nIt turned out that the contractor believed that the best path forward was trying different model architectures. My advice was to ensure the contractor did the data work first. Often, there are bigger gains to be had from data augmentations than from model tweaks (e.g., applying distortions to the hotdog photos). As the data wasn’t sensitive, I also suggested trying third-party computer vision APIs or GPT-4 Vision to get an idea of what’s possible with the dataset.\nExhibit B. Recently, I caught up with an entrepreneur who comes from a marketing background. Just prior to our meeting, they had successfully pitched an app they had built to a large client.\nRemarkably, the app included a hotdog detector similar to the one the ML engineer was struggling to ship. The entrepreneur used the FlutterFlow no-code platform along with Google’s computer vision APIs to rapidly create an app with commercial value – without deep knowledge of ML.\nExpanding the rules No two companies are exactly alike. Sometimes, custom code or ML models are necessary. However, given the pace of innovation in no-code and low-code software \u0026 AI, starting with the least code possible is often a wise choice. Those who build software as their craft often have a blind spot when it comes to no-code possibilities – coders gonna code. However, it’s important to rein in the coding instinct. The difference in total cost between a custom build and using third-party APIs or no-code solutions can easily be in six or seven figures.\nWhen contemplating custom ML and AI development, consider the following options:\nDon’t build it. Wait until it becomes easier. Use a no-code solution. Get a software engineer to implement it with third-party APIs. Get a software engineer to implement it with third-party models that you self-host (with minimal customisation). Get the experts to build it: ML engineers, data scientists, and data engineers. You should be pretty certain that the cost of Option 6 is worth the investment. One way to get there is by starting with one of Options 3-5, thereby proving (or disproving) that there’s commercial value in the most expensive option. And when the time comes for Option 6, always do the data work!\n","wordCount":"554","inLanguage":"en","image":"https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere.webp","datePublished":"2024-02-26T01:45:00Z","dateModified":"2024-03-04T12:39:10+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Avoiding AI complexity: First, write no code</h1><div class=post-meta><span title='2024-02-26 01:45:00 +0000 UTC'>February 26, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere_huffb26b7c0dd0130024613460683c73d0_226926_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere_huffb26b7c0dd0130024613460683c73d0_226926_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere_huffb26b7c0dd0130024613460683c73d0_226926_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere_huffb26b7c0dd0130024613460683c73d0_226926_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/first-write-no-code-primum-non-codere.webp alt="Illustration showing ancient stone tablets with the inscription 'primum non codere' (inspired by primum non noncere: first, do no harm)" width=1200 height=630></figure><div class=post-content><p>Custom software is notoriously hard to build and maintain. Machine learning (ML) adds a layer of complexity on top of traditional software, with <a href=https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf target=_blank rel=noopener>many novel ways to accrue technical debt</a>. Therefore, my general advice to young startups considering custom ML development borrows from <a href=https://wiki.c2.com/?RulesOfOptimization target=_blank rel=noopener>the first and second rules of optimisation</a>:</p><ol><li>Don&rsquo;t.</li><li>Don&rsquo;t Yet (for experts only).</li></ol><p>For startups where Data & AI/ML isn&rsquo;t a core part of the team&rsquo;s capabilities, there are usually higher priorities than building custom ML models. However, deriving commercial value from advancements in AI is still possible – even without writing code at all.</p><p>I recently witnessed two stories that exemplify this point.</p><p><strong>Exhibit A.</strong> Consider this story: A technical lead reached out to me for advice on a computer vision project that wasn&rsquo;t progressing as expected. For the sake of illustration, let&rsquo;s say it was a custom model to <a href=https://www.theverge.com/tldr/2017/5/14/15639784/hbo-silicon-valley-not-hotdog-app-download target=_blank rel=noopener>classify food pictures as hotdog / not hotdog</a>.</p><p>The company had contracted an ML engineer to drive the project. Despite having no background in ML, the lead felt like the contractor was going down the wrong track, and asked me for my thoughts.</p><p>It turned out that the contractor believed that the best path forward was trying different model architectures. My advice was to ensure the contractor did the data work first. Often, <a href=https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0197-0 target=_blank rel=noopener>there are bigger gains to be had from data augmentations than from model tweaks</a> (e.g., applying distortions to the hotdog photos). As the data wasn&rsquo;t sensitive, I also suggested trying third-party computer vision APIs or GPT-4 Vision to get an idea of what&rsquo;s possible with the dataset.</p><p><strong>Exhibit B.</strong> Recently, I caught up with an entrepreneur who comes from a marketing background. Just prior to our meeting, they had successfully pitched an app they had built to a large client.</p><p>Remarkably, <strong>the app included a hotdog detector similar to the one the ML engineer was struggling to ship.</strong> The entrepreneur used the <a href=https://flutterflow.io/ target=_blank rel=noopener>FlutterFlow</a> no-code platform along with Google&rsquo;s computer vision APIs to rapidly create an app with commercial value – without deep knowledge of ML.</p><h2 id=expanding-the-rules>Expanding the rules<a hidden class=anchor aria-hidden=true href=#expanding-the-rules>#</a></h2><p>No two companies are exactly alike. Sometimes, custom code or ML models are necessary. However, given the pace of innovation in no-code and low-code software & AI, starting with the least code possible is often a wise choice. Those who build software as their craft often have a blind spot when it comes to no-code possibilities – <em>coders gonna code</em>. However, it&rsquo;s important to rein in the coding instinct. The difference in total cost between a custom build and using third-party APIs or no-code solutions can easily be in six or seven figures.</p><p>When contemplating custom ML and AI development, consider the following options:</p><ol><li>Don&rsquo;t build it.</li><li><a href=https://www.oneusefulthing.org/p/the-lazy-tyranny-of-the-wait-calculation target=_blank rel=noopener>Wait until it becomes easier</a>.</li><li>Use a no-code solution.</li><li>Get a software engineer to implement it with third-party APIs.</li><li>Get a software engineer to implement it with third-party models that you self-host (with minimal customisation).</li><li>Get the experts to build it: ML engineers, data scientists, and data engineers.</li></ol><p>You should be pretty certain that the cost of Option 6 is worth the investment. One way to get there is by starting with one of Options 3-5, thereby proving (or disproving) that there&rsquo;s commercial value in the most expensive option. And when the time comes for Option 6, <a href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/>always do the data work</a>!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Avoiding AI complexity: First, write no code on x" href="https://x.com/intent/tweet/?text=Avoiding%20AI%20complexity%3a%20First%2c%20write%20no%20code&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f26%2favoiding-ai-complexity-first-write-no-code%2f&amp;hashtags=artificialintelligence%2cdatastrategy%2cmachinelearning%2csoftwareengineering%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Avoiding AI complexity: First, write no code on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f26%2favoiding-ai-complexity-first-write-no-code%2f&amp;title=Avoiding%20AI%20complexity%3a%20First%2c%20write%20no%20code&amp;summary=Avoiding%20AI%20complexity%3a%20First%2c%20write%20no%20code&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f26%2favoiding-ai-complexity-first-write-no-code%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Avoiding AI complexity: First, write no code on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f26%2favoiding-ai-complexity-first-write-no-code%2f&title=Avoiding%20AI%20complexity%3a%20First%2c%20write%20no%20code"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Avoiding AI complexity: First, write no code on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f26%2favoiding-ai-complexity-first-write-no-code%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Avoiding AI complexity: First, write no code on whatsapp" href="https://api.whatsapp.com/send?text=Avoiding%20AI%20complexity%3a%20First%2c%20write%20no%20code%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f26%2favoiding-ai-complexity-first-write-no-code%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Avoiding AI complexity: First, write no code on telegram" href="https://telegram.me/share/url?text=Avoiding%20AI%20complexity%3a%20First%2c%20write%20no%20code&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f26%2favoiding-ai-complexity-first-write-no-code%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Avoiding AI complexity: First, write no code on ycombinator" href="https://news.ycombinator.com/submitlink?t=Avoiding%20AI%20complexity%3a%20First%2c%20write%20no%20code&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f02%2f26%2favoiding-ai-complexity-first-write-no-code%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2024/03/04/two-types-of-startup-data-problems/index.html b/2024/03/04/two-types-of-startup-data-problems/index.html
index 4e5c508f6..710fce68e 100644
--- a/2024/03/04/two-types-of-startup-data-problems/index.html
+++ b/2024/03/04/two-types-of-startup-data-problems/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Two types of startup data problems | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,data strategy,machine learning,startups"><meta name=description content="Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they&rsquo;re likely to face."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Two types of startup data problems"><meta property="og:description" content="Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they&rsquo;re likely to face."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/"><meta property="og:image" content="https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-03-04T02:00:00+00:00"><meta property="article:modified_time" content="2024-03-05T08:47:19+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover.webp"><meta name=twitter:title content="Two types of startup data problems"><meta name=twitter:description content="Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they&rsquo;re likely to face."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Two types of startup data problems","item":"https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Two types of startup data problems","name":"Two types of startup data problems","description":"Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they\u0026rsquo;re likely to face.","keywords":["artificial intelligence","data strategy","machine learning","startups"],"articleBody":"Recently, I’ve been thinking on startup data problems to clarify where I can help potential clients. With over a decade in the data / AI / ML world, I’ve seen new hype waves and job titles appear almost every year. While for insiders this may seem natural, outsiders aren’t fully aware of the differences between the types of data professionals and the problems they solve.\nOne way I classify startup data problems is with the question: Do you need MLOps?\nIf the answer is Yes, then it’s an ML-centric startup. Machine learning is core to the product, so effectively training, deploying, and maintaining ML models (i.e., doing MLOps) is crucial. Such startups should have strong ML and data capabilities in the founding team. Their success depends on it. If the answer is No, then it’s a non-ML startup. Such startups may occasionally build a one-off model, but they won’t be dealing with MLOps early on. Unless they’re building a data-intensive product,1 non-ML startups should hold off on hiring data people until they start hitting product-market fit and scaling their marketing. They can afford to build their data capabilities incrementally with a minimum viable data stack, and follow well-trodden paths of supporting decisions with data. Somewhat confusingly, there’s an overlap between the skills required for ML-centric startups and those required from data people in non-ML startups. This is because much of ML is data work. This is reflected by the following ML system diagram, where the Data Collection and Data Verification boxes are much larger than the ML Code box. Additionally, the Analysis Tools and Monitoring boxes also require data skills, as defining metrics is one of the hardest problems of data science.\nSource: Hidden Technical Debt in Machine Learning Systems Side note: LLMs and black-box APIs. If LLMs are a core part of the product, I still consider it to be an ML-centric startup. I’m not sure if the term LLMOps will catch on, but it has a lot in common with MLOps. Likewise, using LLMs for retrieval-augmented generation is similar to building recommender systems, i.e., ML-centric. The same reasoning applies to black-box ML APIs: If they form a core part of the product, it’s an ML-centric startup because you need to think of data and metrics early on.\nExamples from my past My employment history includes work with both ML-centric and non-ML startups. These examples may help clarify the differences between the two startup types:\nML-centric startup: After my PhD, I was a founding data scientist with a startup called Giveable, where the product was a recommender system for gifts. Giveable disbanded, but I took the codebase to Next Commerce – a company that had a few products in the e-commerce space. There, I led the team that turned Giveable into Hynt – a recommender system as a service. Non-ML startup: I was the first data hire at Car Next Door (now Uber Carshare). Despite my fancy Head of Data Science title (data science was still hyped up at the time), I did a lot of engineering work – including data \u0026 analytics engineering. I also built ML-ish models of customer lifetime value, but it was too early in the company’s life for anything too sophisticated on the ML front. ML work at a non-ML scaleup: After Car Next Door, I spent 4.5 years at Automattic. The company’s headcount grew about 3-4 times in my time there (from about 500 employees). This growth included investment in data and ML: One major project I worked on was ML pipelines to improve marketing performance (e.g., automatically target customers that are most likely to upgrade as a result of a well-timed email). However, I was also involved in data-intensive projects that didn’t include ML. ML-centric product with a non-ML startup: After Automattic, I joined Orkestra to help them build a new product that had ML at its core. However, the company’s main product wasn’t an ML product, and I left on good terms when they pivoted to focus on their main offering. With Giveable/Hynt and Orkestra, attempting ML product development without thinking of MLOps wasn’t going to work. With Car Next Door and Automattic, the company’s success never depended on MLOps, so an incremental approach to using data and ML was viable.\nClosing thoughts While both ML-centric and non-ML startups face data problems, the centrality of data varies between the two. Trying to run an ML-centric startup without a solid grasp of MLOps and data engineering practices is a recipe for failure, while non-ML startups can get away with less-than-ideal data practices for a long time.\nPersonally, I’m always on the lookout for better ways of explaining these differences and coming up with accessible terminology to help founders who are navigating the space. ML-centric and non-ML will do for now, but other suggestions are welcome!\nThis is a fine example of an advantage of writing publicly. The initial version of this post didn’t include the qualification of “unless they’re building a data-intensive product” – I realised it was missing the following day. Perhaps a better classification is data-centric versus data-supported, but I’ll leave that to a future post. ↩︎\n","wordCount":"856","inLanguage":"en","image":"https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover.webp","datePublished":"2024-03-04T02:00:00Z","dateModified":"2024-03-05T08:47:19+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Two types of startup data problems</h1><div class=post-meta><span title='2024-03-04 02:00:00 +0000 UTC'>March 4, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover_hub79a199f57aceb96f2a5c2aa952d419b_94272_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover_hub79a199f57aceb96f2a5c2aa952d419b_94272_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover_hub79a199f57aceb96f2a5c2aa952d419b_94272_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover_hub79a199f57aceb96f2a5c2aa952d419b_94272_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover.webp alt="Decorative cover, based on ChatGPT's interpretation of the post." width=1200 height=630></figure><div class=post-content><p>Recently, I&rsquo;ve been thinking on startup data problems to clarify where I can help potential clients. With over a decade in the data / AI / ML world, I&rsquo;ve seen new hype waves and job titles appear almost every year. While for insiders this may seem natural, outsiders aren&rsquo;t fully aware of the differences between the types of data professionals and the problems they solve.</p><p>One way I classify startup data problems is with the question: <strong>Do you need <a href=https://en.wikipedia.org/wiki/MLOps target=_blank rel=noopener>MLOps</a>?</strong></p><ul><li>If the answer is <em>Yes</em>, then it&rsquo;s an <strong>ML-centric startup</strong>. Machine learning is core to the product, so effectively training, deploying, and maintaining ML models (i.e., doing MLOps) is crucial. Such startups should have strong ML and data capabilities in the founding team. Their success depends on it.</li><li>If the answer is <em>No</em>, then it&rsquo;s a <strong>non-ML startup</strong>. Such startups may occasionally build a one-off model, but they won&rsquo;t be dealing with MLOps early on. Unless they&rsquo;re building a data-intensive product,<sup id=fnref:1><a href=#fn:1 class=footnote-ref role=doc-noteref>1</a></sup> non-ML startups <a href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/>should hold off on hiring data people until they start hitting product-market fit and scaling their marketing</a>. They can afford to build their data capabilities incrementally with <a href=https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/>a minimum viable data stack</a>, and follow well-trodden paths of supporting decisions with data.</li></ul><p>Somewhat confusingly, there&rsquo;s an overlap between the skills required for ML-centric startups and those required from data people in non-ML startups. This is because <strong>much of ML is data work</strong>. This is reflected by the following ML system diagram, where the <em>Data Collection</em> and <em>Data Verification</em> boxes are much larger than the <em>ML Code</em> box. Additionally, the <em>Analysis Tools</em> and <em>Monitoring</em> boxes also require data skills, as <a href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/>defining metrics is one of the hardest problems of data science</a>.</p><figure><a href=machine-learning-system.webp target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="artificial intelligence,data strategy,machine learning,startups"><meta name=description content="Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they&rsquo;re likely to face."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Two types of startup data problems"><meta property="og:description" content="Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they&rsquo;re likely to face."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/"><meta property="og:image" content="https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-03-04T02:00:00+00:00"><meta property="article:modified_time" content="2024-03-05T08:47:19+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover.webp"><meta name=twitter:title content="Two types of startup data problems"><meta name=twitter:description content="Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they&rsquo;re likely to face."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Two types of startup data problems","item":"https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Two types of startup data problems","name":"Two types of startup data problems","description":"Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they\u0026rsquo;re likely to face.","keywords":["artificial intelligence","data strategy","machine learning","startups"],"articleBody":"Recently, I’ve been thinking on startup data problems to clarify where I can help potential clients. With over a decade in the data / AI / ML world, I’ve seen new hype waves and job titles appear almost every year. While for insiders this may seem natural, outsiders aren’t fully aware of the differences between the types of data professionals and the problems they solve.\nOne way I classify startup data problems is with the question: Do you need MLOps?\nIf the answer is Yes, then it’s an ML-centric startup. Machine learning is core to the product, so effectively training, deploying, and maintaining ML models (i.e., doing MLOps) is crucial. Such startups should have strong ML and data capabilities in the founding team. Their success depends on it. If the answer is No, then it’s a non-ML startup. Such startups may occasionally build a one-off model, but they won’t be dealing with MLOps early on. Unless they’re building a data-intensive product,1 non-ML startups should hold off on hiring data people until they start hitting product-market fit and scaling their marketing. They can afford to build their data capabilities incrementally with a minimum viable data stack, and follow well-trodden paths of supporting decisions with data. Somewhat confusingly, there’s an overlap between the skills required for ML-centric startups and those required from data people in non-ML startups. This is because much of ML is data work. This is reflected by the following ML system diagram, where the Data Collection and Data Verification boxes are much larger than the ML Code box. Additionally, the Analysis Tools and Monitoring boxes also require data skills, as defining metrics is one of the hardest problems of data science.\nSource: Hidden Technical Debt in Machine Learning Systems Side note: LLMs and black-box APIs. If LLMs are a core part of the product, I still consider it to be an ML-centric startup. I’m not sure if the term LLMOps will catch on, but it has a lot in common with MLOps. Likewise, using LLMs for retrieval-augmented generation is similar to building recommender systems, i.e., ML-centric. The same reasoning applies to black-box ML APIs: If they form a core part of the product, it’s an ML-centric startup because you need to think of data and metrics early on.\nExamples from my past My employment history includes work with both ML-centric and non-ML startups. These examples may help clarify the differences between the two startup types:\nML-centric startup: After my PhD, I was a founding data scientist with a startup called Giveable, where the product was a recommender system for gifts. Giveable disbanded, but I took the codebase to Next Commerce – a company that had a few products in the e-commerce space. There, I led the team that turned Giveable into Hynt – a recommender system as a service. Non-ML startup: I was the first data hire at Car Next Door (now Uber Carshare). Despite my fancy Head of Data Science title (data science was still hyped up at the time), I did a lot of engineering work – including data \u0026 analytics engineering. I also built ML-ish models of customer lifetime value, but it was too early in the company’s life for anything too sophisticated on the ML front. ML work at a non-ML scaleup: After Car Next Door, I spent 4.5 years at Automattic. The company’s headcount grew about 3-4 times in my time there (from about 500 employees). This growth included investment in data and ML: One major project I worked on was ML pipelines to improve marketing performance (e.g., automatically target customers that are most likely to upgrade as a result of a well-timed email). However, I was also involved in data-intensive projects that didn’t include ML. ML-centric product with a non-ML startup: After Automattic, I joined Orkestra to help them build a new product that had ML at its core. However, the company’s main product wasn’t an ML product, and I left on good terms when they pivoted to focus on their main offering. With Giveable/Hynt and Orkestra, attempting ML product development without thinking of MLOps wasn’t going to work. With Car Next Door and Automattic, the company’s success never depended on MLOps, so an incremental approach to using data and ML was viable.\nClosing thoughts While both ML-centric and non-ML startups face data problems, the centrality of data varies between the two. Trying to run an ML-centric startup without a solid grasp of MLOps and data engineering practices is a recipe for failure, while non-ML startups can get away with less-than-ideal data practices for a long time.\nPersonally, I’m always on the lookout for better ways of explaining these differences and coming up with accessible terminology to help founders who are navigating the space. ML-centric and non-ML will do for now, but other suggestions are welcome!\nThis is a fine example of an advantage of writing publicly. The initial version of this post didn’t include the qualification of “unless they’re building a data-intensive product” – I realised it was missing the following day. Perhaps a better classification is data-centric versus data-supported, but I’ll leave that to a future post. ↩︎\n","wordCount":"856","inLanguage":"en","image":"https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover.webp","datePublished":"2024-03-04T02:00:00Z","dateModified":"2024-03-05T08:47:19+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Two types of startup data problems</h1><div class=post-meta><span title='2024-03-04 02:00:00 +0000 UTC'>March 4, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover_hub79a199f57aceb96f2a5c2aa952d419b_94272_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover_hub79a199f57aceb96f2a5c2aa952d419b_94272_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover_hub79a199f57aceb96f2a5c2aa952d419b_94272_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover_hub79a199f57aceb96f2a5c2aa952d419b_94272_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/cover.webp alt="Decorative cover, based on ChatGPT's interpretation of the post." width=1200 height=630></figure><div class=post-content><p>Recently, I&rsquo;ve been thinking on startup data problems to clarify where I can help potential clients. With over a decade in the data / AI / ML world, I&rsquo;ve seen new hype waves and job titles appear almost every year. While for insiders this may seem natural, outsiders aren&rsquo;t fully aware of the differences between the types of data professionals and the problems they solve.</p><p>One way I classify startup data problems is with the question: <strong>Do you need <a href=https://en.wikipedia.org/wiki/MLOps target=_blank rel=noopener>MLOps</a>?</strong></p><ul><li>If the answer is <em>Yes</em>, then it&rsquo;s an <strong>ML-centric startup</strong>. Machine learning is core to the product, so effectively training, deploying, and maintaining ML models (i.e., doing MLOps) is crucial. Such startups should have strong ML and data capabilities in the founding team. Their success depends on it.</li><li>If the answer is <em>No</em>, then it&rsquo;s a <strong>non-ML startup</strong>. Such startups may occasionally build a one-off model, but they won&rsquo;t be dealing with MLOps early on. Unless they&rsquo;re building a data-intensive product,<sup id=fnref:1><a href=#fn:1 class=footnote-ref role=doc-noteref>1</a></sup> non-ML startups <a href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/>should hold off on hiring data people until they start hitting product-market fit and scaling their marketing</a>. They can afford to build their data capabilities incrementally with <a href=https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/>a minimum viable data stack</a>, and follow well-trodden paths of supporting decisions with data.</li></ul><p>Somewhat confusingly, there&rsquo;s an overlap between the skills required for ML-centric startups and those required from data people in non-ML startups. This is because <strong>much of ML is data work</strong>. This is reflected by the following ML system diagram, where the <em>Data Collection</em> and <em>Data Verification</em> boxes are much larger than the <em>ML Code</em> box. Additionally, the <em>Analysis Tools</em> and <em>Monitoring</em> boxes also require data skills, as <a href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/>defining metrics is one of the hardest problems of data science</a>.</p><figure><a href=machine-learning-system.webp target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/machine-learning-system_hu39175a98128e26abad0a91492df5a70e_35082_360x0_resize_q75_h2_box_2.webp 360w,
 https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/machine-learning-system_hu39175a98128e26abad0a91492df5a70e_35082_480x0_resize_q75_h2_box_2.webp 480w,
 https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/machine-learning-system_hu39175a98128e26abad0a91492df5a70e_35082_720x0_resize_q75_h2_box_2.webp 720w,
diff --git a/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/index.html b/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/index.html
index effb479ad..3f0c10cb7 100644
--- a/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/index.html
+++ b/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Questions to consider when using AI for PDF data extraction | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,data science,machine learning,software engineering"><meta name=description content="Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Questions to consider when using AI for PDF data extraction"><meta property="og:description" content="Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/"><meta property="og:image" content="https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-03-11T00:00:00+00:00"><meta property="article:modified_time" content="2024-03-11T15:53:13+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover.webp"><meta name=twitter:title content="Questions to consider when using AI for PDF data extraction"><meta name=twitter:description content="Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Questions to consider when using AI for PDF data extraction","item":"https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Questions to consider when using AI for PDF data extraction","name":"Questions to consider when using AI for PDF data extraction","description":"Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.","keywords":["artificial intelligence","data science","machine learning","software engineering"],"articleBody":"The jagged frontier of AI means that you can’t always know which tasks are within the capabilities of current models. One such task is the extraction of structured data from PDFs. While this is fully within the capabilities of humans, there are unique challenges in getting off-the-shelf AIs like ChatGPT to do it well. Variants of this task have come up repeatedly in my recent discussions and work, so I put together this summary of my understanding of when it makes sense to try to automate PDF data extraction with AI.\nThis post is structured as a series of questions I’d ask about a proposed project. Actual answers will depend on the specific project.\nWhat is your budget? I assume there is some business value in the extracted data. Therefore, it’s possible to estimate the dollar value of automating the manual processes that are used to extract the data. There’s a big difference between an extraction process that takes a junior employee a couple of weeks per year and one that keeps data entry specialists busy year-round. The budget determines the tools that can be used: If it’s low (e.g., in the thousands), it’s probably only worth spending a few days assessing feasibility with off-the-shelf tools like the OpenAI APIs. If it’s higher (e.g., hundreds of thousands), paying AI engineers to build a bespoke system becomes an option.\nHow sensitive is the data? If the data you’re working with isn’t sensitive (e.g., financial statements by public companies), you’re in luck: You can use the best AIs available. A month ago, it was GPT-4. Today, you have a bunch of other proprietary options. Tomorrow, who knows?\nIf the data is sensitive and can’t leave your organisation’s systems, your options are more limited. Depending on other aspects of the problem, it may mean you’re better off waiting for better open models. However, given the rate of progress in the open source AI ecosystem, assessing feasibility is about as simple as with proprietary solutions. It’s just that right now, you won’t be using the most capable models.\nHow complex are the PDFs? There’s a wide variety of documents out there. If the PDFs you’re working with can be converted to text accurately, the AI models have a good chance of being able to extract the data you’re after. If the conversion to text is insufficiently accurate, the AIs stand little chance of outperforming data entry specialists – they just don’t see the PDFs as well as we do.\nIt’s worth spending some time on this question. For example, if you’re using OpenAI’s APIs and much of the data you’re looking to extract is contained in tables within PDFs, you can first test whether you can reliably retrieve and display specific tables. Sticking with the example of financial statements, there’s a big difference between this 124-page sample from Grant Thornton and this seven-page sample from the Australian Fair Work Commission. Prompting GPT-4 (via ChatGPT Plus) to extract a table from the former PDF produced output that was only loosely-connected to the actual content. By contrast, GPT-4 perfectly reproduced tables from the latter PDF.\nIn general, perfect conversion of PDF tables to text appears to be an unsolved problem. For example, in a benchmark from last year, the best tool tested only had about 50% accuracy when applied to tables in scientific papers. However, the field keeps moving. You may get an accuracy boost by treating PDFs as images and using vision models, as recommended by the Unstructured library. Anecdotally, I found Unstructured’s table parsing to be too inaccurate, but when I used GPT-4 Vision on screenshots of the same tables it yielded much better results. It also outperformed OpenAI’s default PDF parser. Your mileage will definitely vary.\nThings get even more complicated if some of the data you’re hoping to extract is in graphs and other figures contained in the PDFs. Verifying that the PDFs are not too complex for the AI models is definitely worth doing before jumping into more elaborate data extraction tasks.\nCan the tokenised PDFs fully fit in the model’s context window? The complexity of the PDFs determines how well the AI models can see them. Their length determines how much of them the models can see at once. There’s nuance depending on whether the PDFs are fully converted to text tokens or to text and images, but the number of tokens that can be fed in (i.e., the context window) is limited with the current generation of AI models. Context windows are rapidly expanding, with Google recently releasing a million-token model (approximately 700,000 words), but there’s still a cost per token that you need to consider when building solutions.\nIf you’re working with PDFs that are larger than your context windows, you’re probably going to use retrieval-augmented generation, i.e., break the documents into chunks and feed specific chunks into the model based on the query. With some tools and APIs, this may be handled for you (e.g., with OpenAI’s Assistants). However, it introduces a source of inaccuracies that may not be acceptable for your use case.\nWhat is your teaching approach? Assuming you’re satisfied that the AIs can see enough of the PDFs well enough to provide useful answers, it’s time to test different ways of teaching them about the data extraction tasks. The question of which approaches you can test is closely tied to your budget. But even with large budgets, it’s best to start simple and only attempt more complicated approaches if the simpler ones fail. In order of implementation complexity, key teaching approaches are:\nZero-shot: Describe the task and the expected output, provide a new PDF, and see if you get the expected output. Few-shot: In addition to describing the task and expected output, also provide examples of past PDFs and their extracted outputs. Given context window limitations, this is only feasible with relatively short PDFs and simple outputs. Fine-tuning: This goes beyond the sort of prompting that has become accessible to the general population via ChatGPT. The general idea is that you can get better results by teaching the underlying model about your expected inputs and outputs. Even if you don’t have machine learning experts on your team, you may get good results by following resources such as the fine-tuning guide by OpenAI. However, success isn’t guaranteed, so it’s important to manage expectations and budgets accordingly. It may well be the case that you’re better off waiting a few months or years for new AI models, rather than investing in fine-tuning experimentation. New models are likely to make lower-effort zero/few-shot results better. Custom models: Taking a step beyond fine-tuning, building custom machine learning models may be a good match for your budget and available expertise. However, you definitely won’t be doing it to automate a low-cost-low-frequency data entry process. Implicit in the above is the availability of some training \u0026 testing data (i.e., input PDFs and expected outputs). That is, no matter what approach you follow, you’d want to have some confidence that it works beyond a few test samples – use a large representative dataset to gain confidence in your solution.\nCan the AI model understand the input structures? This is closely related to the question of PDF complexity, but worth considering separately. I’m anthropomorphising a bit by talking about AI understanding, but just as they can’t see the same as we do, their level of understanding may also be unintuitive. For example, a recent paper that proposed a fine-tuning approach to improve GPT-3.5’s table understanding made the case that general language models can’t read tables reliably because:\nNatural language texts are (1) one-directional, (2) read left-to-right, where (3) swapping two tokens will generally change the meaning of a sentence. In contrast, relational tables are (1) two-dimensional in nature with both rows and columns, (2) where reading top-to-bottom in the vertical direction for values in the same column, is crucial in many table-tasks. Furthermore, unlike text, (3) tables are largely “invariant” to row and column permutations, where swapping two rows or columns do not generally change the semantic meaning of the table.\nThis argument is compelling, but given the emergent abilities of large language models and the fact that we no longer know what goes into proprietary models beyond GPT-3.5, I wouldn’t bet on these limitations being an issue for all tabular data. Again, experimenting with your specific use case is key. If you encounter issues, it’s worth probing the models to check if they exhibit any semantic understanding beyond just reproducing the inputs.\nCan the AI model understand and produce the output structures? If you’re building custom models, it’s straightforward to get exactly the output structure you want (e.g., a complex JSON). Otherwise, if you’re prompting a language model, you need to ask nicely and hope for the best. That said, there are strategies to get models to produce the output structures you want, such as using OpenAI’s function calling or the Outlines library. However, as with the example of tabular inputs, there is a difference between being able to produce an output that conforms to a specific output schema and populating the schema with values that make semantic sense. Breaking down complex outputs to simpler structures and using prompt chaining may be helpful in some cases.\nWhat is your long-term validation approach? Assuming you successfully build an AI solution that can replace manual data entry, should you completely stop manual data extraction? As with other questions, it depends on the use case, but it’s worth considering a gradual switch to full automation. For example, you can keep the manual process for 10% of new data to verify that the whole system works as expected. This is especially worth doing when working with publicly-available datasets, as there’s a non-zero chance that the models you’re using have seen the input training data before (though they probably haven’t seen your outputs).\nAnything else? If I missed important questions, please let me know and I will update this post.\n","wordCount":"1663","inLanguage":"en","image":"https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover.webp","datePublished":"2024-03-11T00:00:00Z","dateModified":"2024-03-11T15:53:13+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Questions to consider when using AI for PDF data extraction</h1><div class=post-meta><span title='2024-03-11 00:00:00 +0000 UTC'>March 11, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover_hu26d454933c933ac9c1be43dc275c5062_65412_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover_hu26d454933c933ac9c1be43dc275c5062_65412_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover_hu26d454933c933ac9c1be43dc275c5062_65412_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover_hu26d454933c933ac9c1be43dc275c5062_65412_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover.webp alt="Decorative image showing a book and flying documents with an AI-themed overlay" width=1200 height=630></figure><div class=post-content><p>The <a href=https://www.oneusefulthing.org/p/centaurs-and-cyborgs-on-the-jagged target=_blank rel=noopener>jagged frontier</a> of AI means that you can&rsquo;t always know which tasks are within the capabilities of current models. One such task is the extraction of structured data from PDFs. While this is fully within the capabilities of humans, there are unique challenges in getting off-the-shelf AIs like ChatGPT to do it well. Variants of this task have come up repeatedly in my recent discussions and work, so I put together this summary of my understanding of when it makes sense to try to automate PDF data extraction with AI.</p><p>This post is structured as a series of questions I&rsquo;d ask about a proposed project. Actual answers will depend on the specific project.</p><h2 id=what-is-your-budget>What is your budget?<a hidden class=anchor aria-hidden=true href=#what-is-your-budget>#</a></h2><p>I assume there is some business value in the extracted data. Therefore, it&rsquo;s possible to estimate the dollar value of automating the manual processes that are used to extract the data. There&rsquo;s a big difference between an extraction process that takes a junior employee a couple of weeks per year and one that keeps data entry specialists busy year-round. The budget determines the tools that can be used: If it&rsquo;s low (e.g., in the thousands), it&rsquo;s probably only worth spending a few days assessing feasibility with off-the-shelf tools like the OpenAI APIs. If it&rsquo;s higher (e.g., hundreds of thousands), paying AI engineers to build a bespoke system becomes an option.</p><h2 id=how-sensitive-is-the-data>How sensitive is the data?<a hidden class=anchor aria-hidden=true href=#how-sensitive-is-the-data>#</a></h2><p>If the data you&rsquo;re working with isn&rsquo;t sensitive (e.g., financial statements by public companies), you&rsquo;re in luck: You can use the best AIs available. A month ago, it was GPT-4. Today, <a href=https://simonwillison.net/2024/Mar/8/gpt-4-barrier/ target=_blank rel=noopener>you have a bunch of other proprietary options</a>. Tomorrow, who knows?</p><p>If the data is sensitive and can&rsquo;t leave your organisation&rsquo;s systems, your options are more limited. Depending on other aspects of the problem, it may mean you&rsquo;re better off <a href=https://www.oneusefulthing.org/p/the-lazy-tyranny-of-the-wait-calculation target=_blank rel=noopener>waiting for better open models</a>. However, given the rate of progress in the open source AI ecosystem, assessing feasibility is about as simple as with proprietary solutions. It&rsquo;s just that right now, you won&rsquo;t be using the most capable models.</p><h2 id=how-complex-are-the-pdfs>How complex are the PDFs?<a hidden class=anchor aria-hidden=true href=#how-complex-are-the-pdfs>#</a></h2><p>There&rsquo;s a wide variety of documents out there. If the PDFs you&rsquo;re working with can be converted to text accurately, the AI models have a good chance of being able to extract the data you&rsquo;re after. If the conversion to text is insufficiently accurate, the AIs stand little chance of outperforming data entry specialists – they just don&rsquo;t see the PDFs as well as we do.</p><p>It&rsquo;s worth spending some time on this question. For example, if you&rsquo;re using OpenAI&rsquo;s APIs and much of the data you&rsquo;re looking to extract is contained in tables within PDFs, you can first test whether you can reliably retrieve and display specific tables. Sticking with the example of financial statements, there&rsquo;s a big difference between <a href=https://www.grantthornton.global/contentassets/5bd3489f6516406d883a7300da904e96/ifrs-example-financial-statements-2022_feb-2023.pdf target=_blank rel=noopener>this 124-page sample from Grant Thornton</a> and <a href=https://regorgs.fwc.gov.au/sites/default/files/migration/429/fs023-sample-financial-statements.pdf target=_blank rel=noopener>this seven-page sample from the Australian Fair Work Commission</a>. Prompting GPT-4 (via ChatGPT Plus) to extract a table from the former PDF produced output that was only loosely-connected to the actual content. By contrast, GPT-4 perfectly reproduced tables from the latter PDF.</p><p>In general, perfect conversion of PDF tables to text appears to be an unsolved problem. For example, in <a href=https://arxiv.org/pdf/2303.09957.pdf target=_blank rel=noopener>a benchmark from last year</a>, the best tool tested only had about 50% accuracy when applied to tables in scientific papers. However, the field keeps moving. You may get an accuracy boost by treating PDFs as images and using vision models, as <a href=https://unstructured-io.github.io/unstructured/best_practices/table_extraction_pdf.html target=_blank rel=noopener>recommended by the Unstructured library</a>. Anecdotally, I found Unstructured&rsquo;s table parsing to be too inaccurate, but when I used GPT-4 Vision on screenshots of the same tables it yielded much better results. It also outperformed OpenAI&rsquo;s default PDF parser. Your mileage will definitely vary.</p><p>Things get even more complicated if some of the data you&rsquo;re hoping to extract is in graphs and other figures contained in the PDFs. Verifying that the PDFs are not too complex <em>for the AI models</em> is definitely worth doing before jumping into more elaborate data extraction tasks.</p><h2 id=can-the-tokenised-pdfs-fully-fit-in-the-models-context-window>Can the tokenised PDFs fully fit in the model&rsquo;s context window?<a hidden class=anchor aria-hidden=true href=#can-the-tokenised-pdfs-fully-fit-in-the-models-context-window>#</a></h2><p>The complexity of the PDFs determines <em>how well</em> the AI models can see them. Their length determines <em>how much</em> of them the models can see at once. There&rsquo;s nuance depending on whether the PDFs are fully converted to text <a href=https://platform.openai.com/tokenizer target=_blank rel=noopener>tokens</a> or to text and images, but the number of tokens that can be fed in (i.e., the context window) is limited with the current generation of AI models. Context windows are rapidly expanding, with <a href=https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/ target=_blank rel=noopener>Google recently releasing a million-token model</a> (approximately 700,000 words), but there&rsquo;s still a cost per token that you need to consider when building solutions.</p><p>If you&rsquo;re working with PDFs that are larger than your context windows, you&rsquo;re probably going to use retrieval-augmented generation, i.e., break the documents into chunks and feed specific chunks into the model based on the query. With some tools and APIs, this may be handled for you (e.g., with OpenAI&rsquo;s Assistants). However, it introduces a source of inaccuracies that may not be acceptable for your use case.</p><h2 id=what-is-your-teaching-approach>What is your teaching approach?<a hidden class=anchor aria-hidden=true href=#what-is-your-teaching-approach>#</a></h2><p>Assuming you&rsquo;re satisfied that the AIs can see enough of the PDFs well enough to provide useful answers, it&rsquo;s time to test different ways of teaching them about the data extraction tasks. The question of which approaches you can test is closely tied to your budget. But even with large budgets, it&rsquo;s best to start simple and only attempt more complicated approaches if the simpler ones fail. In order of implementation complexity, key teaching approaches are:</p><ul><li><strong>Zero-shot:</strong> Describe the task and the expected output, provide a new PDF, and see if you get the expected output.</li><li><strong>Few-shot:</strong> In addition to describing the task and expected output, also provide examples of past PDFs and their extracted outputs. Given context window limitations, this is only feasible with relatively short PDFs and simple outputs.</li><li><strong>Fine-tuning:</strong> This goes beyond the sort of prompting that has become accessible to the general population via ChatGPT. The general idea is that you can get better results by teaching the underlying model about your expected inputs and outputs. Even if you don&rsquo;t have machine learning experts on your team, you may get good results by following resources such as <a href=https://platform.openai.com/docs/guides/fine-tuning/ target=_blank rel=noopener>the fine-tuning guide by OpenAI</a>. However, success isn&rsquo;t guaranteed, so it&rsquo;s important to manage expectations and budgets accordingly. It may well be the case that you&rsquo;re better off waiting a few months or years for new AI models, rather than investing in fine-tuning experimentation. New models are likely to make lower-effort zero/few-shot results better.</li><li><strong>Custom models:</strong> Taking a step beyond fine-tuning, building custom machine learning models may be a good match for your budget and available expertise. However, you definitely won&rsquo;t be doing it to automate a low-cost-low-frequency data entry process.</li></ul><p>Implicit in the above is the availability of some training & testing data (i.e., input PDFs and expected outputs). That is, no matter what approach you follow, you&rsquo;d want to have some confidence that it works beyond <a href=https://jxnl.github.io/blog/writing/2024/02/05/when-to-lgtm-at-k/ target=_blank rel=noopener>a few test samples</a> – use a large representative dataset to gain confidence in your solution.</p><h2 id=can-the-ai-model-understand-the-input-structures>Can the AI model understand the input structures?<a hidden class=anchor aria-hidden=true href=#can-the-ai-model-understand-the-input-structures>#</a></h2><p>This is closely related to the question of PDF complexity, but worth considering separately. I&rsquo;m anthropomorphising a bit by talking about AI <em>understanding</em>, but just as they can&rsquo;t see the same as we do, their level of understanding may also be unintuitive. For example, <a href=https://arxiv.org/pdf/2310.09263.pdf target=_blank rel=noopener>a recent paper</a> that proposed a fine-tuning approach to improve GPT-3.5&rsquo;s table understanding made the case that general language models can&rsquo;t read tables reliably because:</p><blockquote><p>Natural language texts are (1) one-directional, (2) read left-to-right, where (3) swapping two tokens will generally change the meaning of a sentence. In contrast, relational tables are (1) two-dimensional in nature with both rows and columns, (2) where reading top-to-bottom in the vertical direction for values in the same column, is crucial in many table-tasks. Furthermore, unlike text, (3) tables are largely &ldquo;invariant&rdquo; to row and column permutations, where swapping two rows or columns do not generally change the semantic meaning of the table.</p></blockquote><p>This argument is compelling, but given <a href="https://openreview.net/pdf?id=yzkSU5zdwD" target=_blank rel=noopener>the emergent abilities of large language models</a> and the fact that we no longer know what goes into proprietary models beyond GPT-3.5, I wouldn&rsquo;t bet on these limitations being an issue for <em>all</em> tabular data. Again, experimenting with your specific use case is key. If you encounter issues, it&rsquo;s worth probing the models to check if they exhibit any semantic understanding beyond just reproducing the inputs.</p><h2 id=can-the-ai-model-understand-and-produce-the-output-structures>Can the AI model understand and produce the output structures?<a hidden class=anchor aria-hidden=true href=#can-the-ai-model-understand-and-produce-the-output-structures>#</a></h2><p>If you&rsquo;re building custom models, it&rsquo;s straightforward to get exactly the output structure you want (e.g., a complex JSON). Otherwise, if you&rsquo;re prompting a language model, you need to ask <a href=https://arxiv.org/pdf/2402.14531.pdf target=_blank rel=noopener>nicely</a> and hope for the best. That said, there are strategies to get models to produce the output structures you want, such as using <a href=https://platform.openai.com/docs/guides/function-calling target=_blank rel=noopener>OpenAI&rsquo;s function calling</a> or <a href=https://github.com/outlines-dev/outlines target=_blank rel=noopener>the Outlines library</a>. However, as with the example of tabular inputs, there is a difference between being able to produce an output that conforms to a specific output schema and populating the schema with values that make semantic sense. Breaking down complex outputs to simpler structures and using <a href=https://www.promptingguide.ai/techniques/prompt_chaining target=_blank rel=noopener>prompt chaining</a> may be helpful in some cases.</p><h2 id=what-is-your-long-term-validation-approach>What is your long-term validation approach?<a hidden class=anchor aria-hidden=true href=#what-is-your-long-term-validation-approach>#</a></h2><p>Assuming you successfully build an AI solution that can replace manual data entry, should you completely stop manual data extraction? As with other questions, it depends on the use case, but it&rsquo;s worth considering a gradual switch to full automation. For example, you can keep the manual process for 10% of new data to verify that the whole system works as expected. This is especially worth doing when working with publicly-available datasets, as there&rsquo;s a non-zero chance that the models you&rsquo;re using have seen the input training data before (though they probably haven&rsquo;t seen your outputs).</p><h2 id=anything-else>Anything else?<a hidden class=anchor aria-hidden=true href=#anything-else>#</a></h2><p>If I missed important questions, please let me know and I will update this post.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Questions to consider when using AI for PDF data extraction on x" href="https://x.com/intent/tweet/?text=Questions%20to%20consider%20when%20using%20AI%20for%20PDF%20data%20extraction&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f03%2f11%2fquestions-to-consider-when-using-ai-for-pdf-data-extraction%2f&amp;hashtags=artificialintelligence%2cdatascience%2cmachinelearning%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Questions to consider when using AI for PDF data extraction on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f03%2f11%2fquestions-to-consider-when-using-ai-for-pdf-data-extraction%2f&amp;title=Questions%20to%20consider%20when%20using%20AI%20for%20PDF%20data%20extraction&amp;summary=Questions%20to%20consider%20when%20using%20AI%20for%20PDF%20data%20extraction&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f03%2f11%2fquestions-to-consider-when-using-ai-for-pdf-data-extraction%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Questions to consider when using AI for PDF data extraction on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f03%2f11%2fquestions-to-consider-when-using-ai-for-pdf-data-extraction%2f&title=Questions%20to%20consider%20when%20using%20AI%20for%20PDF%20data%20extraction"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Questions to consider when using AI for PDF data extraction on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f03%2f11%2fquestions-to-consider-when-using-ai-for-pdf-data-extraction%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Questions to consider when using AI for PDF data extraction on whatsapp" href="https://api.whatsapp.com/send?text=Questions%20to%20consider%20when%20using%20AI%20for%20PDF%20data%20extraction%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f03%2f11%2fquestions-to-consider-when-using-ai-for-pdf-data-extraction%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Questions to consider when using AI for PDF data extraction on telegram" href="https://telegram.me/share/url?text=Questions%20to%20consider%20when%20using%20AI%20for%20PDF%20data%20extraction&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f03%2f11%2fquestions-to-consider-when-using-ai-for-pdf-data-extraction%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Questions to consider when using AI for PDF data extraction on ycombinator" href="https://news.ycombinator.com/submitlink?t=Questions%20to%20consider%20when%20using%20AI%20for%20PDF%20data%20extraction&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f03%2f11%2fquestions-to-consider-when-using-ai-for-pdf-data-extraction%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="artificial intelligence,data science,machine learning,software engineering"><meta name=description content="Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Questions to consider when using AI for PDF data extraction"><meta property="og:description" content="Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/"><meta property="og:image" content="https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-03-11T00:00:00+00:00"><meta property="article:modified_time" content="2024-03-11T15:53:13+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover.webp"><meta name=twitter:title content="Questions to consider when using AI for PDF data extraction"><meta name=twitter:description content="Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Questions to consider when using AI for PDF data extraction","item":"https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Questions to consider when using AI for PDF data extraction","name":"Questions to consider when using AI for PDF data extraction","description":"Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.","keywords":["artificial intelligence","data science","machine learning","software engineering"],"articleBody":"The jagged frontier of AI means that you can’t always know which tasks are within the capabilities of current models. One such task is the extraction of structured data from PDFs. While this is fully within the capabilities of humans, there are unique challenges in getting off-the-shelf AIs like ChatGPT to do it well. Variants of this task have come up repeatedly in my recent discussions and work, so I put together this summary of my understanding of when it makes sense to try to automate PDF data extraction with AI.\nThis post is structured as a series of questions I’d ask about a proposed project. Actual answers will depend on the specific project.\nWhat is your budget? I assume there is some business value in the extracted data. Therefore, it’s possible to estimate the dollar value of automating the manual processes that are used to extract the data. There’s a big difference between an extraction process that takes a junior employee a couple of weeks per year and one that keeps data entry specialists busy year-round. The budget determines the tools that can be used: If it’s low (e.g., in the thousands), it’s probably only worth spending a few days assessing feasibility with off-the-shelf tools like the OpenAI APIs. If it’s higher (e.g., hundreds of thousands), paying AI engineers to build a bespoke system becomes an option.\nHow sensitive is the data? If the data you’re working with isn’t sensitive (e.g., financial statements by public companies), you’re in luck: You can use the best AIs available. A month ago, it was GPT-4. Today, you have a bunch of other proprietary options. Tomorrow, who knows?\nIf the data is sensitive and can’t leave your organisation’s systems, your options are more limited. Depending on other aspects of the problem, it may mean you’re better off waiting for better open models. However, given the rate of progress in the open source AI ecosystem, assessing feasibility is about as simple as with proprietary solutions. It’s just that right now, you won’t be using the most capable models.\nHow complex are the PDFs? There’s a wide variety of documents out there. If the PDFs you’re working with can be converted to text accurately, the AI models have a good chance of being able to extract the data you’re after. If the conversion to text is insufficiently accurate, the AIs stand little chance of outperforming data entry specialists – they just don’t see the PDFs as well as we do.\nIt’s worth spending some time on this question. For example, if you’re using OpenAI’s APIs and much of the data you’re looking to extract is contained in tables within PDFs, you can first test whether you can reliably retrieve and display specific tables. Sticking with the example of financial statements, there’s a big difference between this 124-page sample from Grant Thornton and this seven-page sample from the Australian Fair Work Commission. Prompting GPT-4 (via ChatGPT Plus) to extract a table from the former PDF produced output that was only loosely-connected to the actual content. By contrast, GPT-4 perfectly reproduced tables from the latter PDF.\nIn general, perfect conversion of PDF tables to text appears to be an unsolved problem. For example, in a benchmark from last year, the best tool tested only had about 50% accuracy when applied to tables in scientific papers. However, the field keeps moving. You may get an accuracy boost by treating PDFs as images and using vision models, as recommended by the Unstructured library. Anecdotally, I found Unstructured’s table parsing to be too inaccurate, but when I used GPT-4 Vision on screenshots of the same tables it yielded much better results. It also outperformed OpenAI’s default PDF parser. Your mileage will definitely vary.\nThings get even more complicated if some of the data you’re hoping to extract is in graphs and other figures contained in the PDFs. Verifying that the PDFs are not too complex for the AI models is definitely worth doing before jumping into more elaborate data extraction tasks.\nCan the tokenised PDFs fully fit in the model’s context window? The complexity of the PDFs determines how well the AI models can see them. Their length determines how much of them the models can see at once. There’s nuance depending on whether the PDFs are fully converted to text tokens or to text and images, but the number of tokens that can be fed in (i.e., the context window) is limited with the current generation of AI models. Context windows are rapidly expanding, with Google recently releasing a million-token model (approximately 700,000 words), but there’s still a cost per token that you need to consider when building solutions.\nIf you’re working with PDFs that are larger than your context windows, you’re probably going to use retrieval-augmented generation, i.e., break the documents into chunks and feed specific chunks into the model based on the query. With some tools and APIs, this may be handled for you (e.g., with OpenAI’s Assistants). However, it introduces a source of inaccuracies that may not be acceptable for your use case.\nWhat is your teaching approach? Assuming you’re satisfied that the AIs can see enough of the PDFs well enough to provide useful answers, it’s time to test different ways of teaching them about the data extraction tasks. The question of which approaches you can test is closely tied to your budget. But even with large budgets, it’s best to start simple and only attempt more complicated approaches if the simpler ones fail. In order of implementation complexity, key teaching approaches are:\nZero-shot: Describe the task and the expected output, provide a new PDF, and see if you get the expected output. Few-shot: In addition to describing the task and expected output, also provide examples of past PDFs and their extracted outputs. Given context window limitations, this is only feasible with relatively short PDFs and simple outputs. Fine-tuning: This goes beyond the sort of prompting that has become accessible to the general population via ChatGPT. The general idea is that you can get better results by teaching the underlying model about your expected inputs and outputs. Even if you don’t have machine learning experts on your team, you may get good results by following resources such as the fine-tuning guide by OpenAI. However, success isn’t guaranteed, so it’s important to manage expectations and budgets accordingly. It may well be the case that you’re better off waiting a few months or years for new AI models, rather than investing in fine-tuning experimentation. New models are likely to make lower-effort zero/few-shot results better. Custom models: Taking a step beyond fine-tuning, building custom machine learning models may be a good match for your budget and available expertise. However, you definitely won’t be doing it to automate a low-cost-low-frequency data entry process. Implicit in the above is the availability of some training \u0026 testing data (i.e., input PDFs and expected outputs). That is, no matter what approach you follow, you’d want to have some confidence that it works beyond a few test samples – use a large representative dataset to gain confidence in your solution.\nCan the AI model understand the input structures? This is closely related to the question of PDF complexity, but worth considering separately. I’m anthropomorphising a bit by talking about AI understanding, but just as they can’t see the same as we do, their level of understanding may also be unintuitive. For example, a recent paper that proposed a fine-tuning approach to improve GPT-3.5’s table understanding made the case that general language models can’t read tables reliably because:\nNatural language texts are (1) one-directional, (2) read left-to-right, where (3) swapping two tokens will generally change the meaning of a sentence. In contrast, relational tables are (1) two-dimensional in nature with both rows and columns, (2) where reading top-to-bottom in the vertical direction for values in the same column, is crucial in many table-tasks. Furthermore, unlike text, (3) tables are largely “invariant” to row and column permutations, where swapping two rows or columns do not generally change the semantic meaning of the table.\nThis argument is compelling, but given the emergent abilities of large language models and the fact that we no longer know what goes into proprietary models beyond GPT-3.5, I wouldn’t bet on these limitations being an issue for all tabular data. Again, experimenting with your specific use case is key. If you encounter issues, it’s worth probing the models to check if they exhibit any semantic understanding beyond just reproducing the inputs.\nCan the AI model understand and produce the output structures? If you’re building custom models, it’s straightforward to get exactly the output structure you want (e.g., a complex JSON). Otherwise, if you’re prompting a language model, you need to ask nicely and hope for the best. That said, there are strategies to get models to produce the output structures you want, such as using OpenAI’s function calling or the Outlines library. However, as with the example of tabular inputs, there is a difference between being able to produce an output that conforms to a specific output schema and populating the schema with values that make semantic sense. Breaking down complex outputs to simpler structures and using prompt chaining may be helpful in some cases.\nWhat is your long-term validation approach? Assuming you successfully build an AI solution that can replace manual data entry, should you completely stop manual data extraction? As with other questions, it depends on the use case, but it’s worth considering a gradual switch to full automation. For example, you can keep the manual process for 10% of new data to verify that the whole system works as expected. This is especially worth doing when working with publicly-available datasets, as there’s a non-zero chance that the models you’re using have seen the input training data before (though they probably haven’t seen your outputs).\nAnything else? If I missed important questions, please let me know and I will update this post.\n","wordCount":"1663","inLanguage":"en","image":"https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover.webp","datePublished":"2024-03-11T00:00:00Z","dateModified":"2024-03-11T15:53:13+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Questions to consider when using AI for PDF data extraction</h1><div class=post-meta><span title='2024-03-11 00:00:00 +0000 UTC'>March 11, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover_hu26d454933c933ac9c1be43dc275c5062_65412_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover_hu26d454933c933ac9c1be43dc275c5062_65412_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover_hu26d454933c933ac9c1be43dc275c5062_65412_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover_hu26d454933c933ac9c1be43dc275c5062_65412_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/cover.webp alt="Decorative image showing a book and flying documents with an AI-themed overlay" width=1200 height=630></figure><div class=post-content><p>The <a href=https://www.oneusefulthing.org/p/centaurs-and-cyborgs-on-the-jagged target=_blank rel=noopener>jagged frontier</a> of AI means that you can&rsquo;t always know which tasks are within the capabilities of current models. One such task is the extraction of structured data from PDFs. While this is fully within the capabilities of humans, there are unique challenges in getting off-the-shelf AIs like ChatGPT to do it well. Variants of this task have come up repeatedly in my recent discussions and work, so I put together this summary of my understanding of when it makes sense to try to automate PDF data extraction with AI.</p><p>This post is structured as a series of questions I&rsquo;d ask about a proposed project. Actual answers will depend on the specific project.</p><h2 id=what-is-your-budget>What is your budget?<a hidden class=anchor aria-hidden=true href=#what-is-your-budget>#</a></h2><p>I assume there is some business value in the extracted data. Therefore, it&rsquo;s possible to estimate the dollar value of automating the manual processes that are used to extract the data. There&rsquo;s a big difference between an extraction process that takes a junior employee a couple of weeks per year and one that keeps data entry specialists busy year-round. The budget determines the tools that can be used: If it&rsquo;s low (e.g., in the thousands), it&rsquo;s probably only worth spending a few days assessing feasibility with off-the-shelf tools like the OpenAI APIs. If it&rsquo;s higher (e.g., hundreds of thousands), paying AI engineers to build a bespoke system becomes an option.</p><h2 id=how-sensitive-is-the-data>How sensitive is the data?<a hidden class=anchor aria-hidden=true href=#how-sensitive-is-the-data>#</a></h2><p>If the data you&rsquo;re working with isn&rsquo;t sensitive (e.g., financial statements by public companies), you&rsquo;re in luck: You can use the best AIs available. A month ago, it was GPT-4. Today, <a href=https://simonwillison.net/2024/Mar/8/gpt-4-barrier/ target=_blank rel=noopener>you have a bunch of other proprietary options</a>. Tomorrow, who knows?</p><p>If the data is sensitive and can&rsquo;t leave your organisation&rsquo;s systems, your options are more limited. Depending on other aspects of the problem, it may mean you&rsquo;re better off <a href=https://www.oneusefulthing.org/p/the-lazy-tyranny-of-the-wait-calculation target=_blank rel=noopener>waiting for better open models</a>. However, given the rate of progress in the open source AI ecosystem, assessing feasibility is about as simple as with proprietary solutions. It&rsquo;s just that right now, you won&rsquo;t be using the most capable models.</p><h2 id=how-complex-are-the-pdfs>How complex are the PDFs?<a hidden class=anchor aria-hidden=true href=#how-complex-are-the-pdfs>#</a></h2><p>There&rsquo;s a wide variety of documents out there. If the PDFs you&rsquo;re working with can be converted to text accurately, the AI models have a good chance of being able to extract the data you&rsquo;re after. If the conversion to text is insufficiently accurate, the AIs stand little chance of outperforming data entry specialists – they just don&rsquo;t see the PDFs as well as we do.</p><p>It&rsquo;s worth spending some time on this question. For example, if you&rsquo;re using OpenAI&rsquo;s APIs and much of the data you&rsquo;re looking to extract is contained in tables within PDFs, you can first test whether you can reliably retrieve and display specific tables. Sticking with the example of financial statements, there&rsquo;s a big difference between <a href=https://www.grantthornton.global/contentassets/5bd3489f6516406d883a7300da904e96/ifrs-example-financial-statements-2022_feb-2023.pdf target=_blank rel=noopener>this 124-page sample from Grant Thornton</a> and <a href=https://regorgs.fwc.gov.au/sites/default/files/migration/429/fs023-sample-financial-statements.pdf target=_blank rel=noopener>this seven-page sample from the Australian Fair Work Commission</a>. Prompting GPT-4 (via ChatGPT Plus) to extract a table from the former PDF produced output that was only loosely-connected to the actual content. By contrast, GPT-4 perfectly reproduced tables from the latter PDF.</p><p>In general, perfect conversion of PDF tables to text appears to be an unsolved problem. For example, in <a href=https://arxiv.org/pdf/2303.09957.pdf target=_blank rel=noopener>a benchmark from last year</a>, the best tool tested only had about 50% accuracy when applied to tables in scientific papers. However, the field keeps moving. You may get an accuracy boost by treating PDFs as images and using vision models, as <a href=https://unstructured-io.github.io/unstructured/best_practices/table_extraction_pdf.html target=_blank rel=noopener>recommended by the Unstructured library</a>. Anecdotally, I found Unstructured&rsquo;s table parsing to be too inaccurate, but when I used GPT-4 Vision on screenshots of the same tables it yielded much better results. It also outperformed OpenAI&rsquo;s default PDF parser. Your mileage will definitely vary.</p><p>Things get even more complicated if some of the data you&rsquo;re hoping to extract is in graphs and other figures contained in the PDFs. Verifying that the PDFs are not too complex <em>for the AI models</em> is definitely worth doing before jumping into more elaborate data extraction tasks.</p><h2 id=can-the-tokenised-pdfs-fully-fit-in-the-models-context-window>Can the tokenised PDFs fully fit in the model&rsquo;s context window?<a hidden class=anchor aria-hidden=true href=#can-the-tokenised-pdfs-fully-fit-in-the-models-context-window>#</a></h2><p>The complexity of the PDFs determines <em>how well</em> the AI models can see them. Their length determines <em>how much</em> of them the models can see at once. There&rsquo;s nuance depending on whether the PDFs are fully converted to text <a href=https://platform.openai.com/tokenizer target=_blank rel=noopener>tokens</a> or to text and images, but the number of tokens that can be fed in (i.e., the context window) is limited with the current generation of AI models. Context windows are rapidly expanding, with <a href=https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/ target=_blank rel=noopener>Google recently releasing a million-token model</a> (approximately 700,000 words), but there&rsquo;s still a cost per token that you need to consider when building solutions.</p><p>If you&rsquo;re working with PDFs that are larger than your context windows, you&rsquo;re probably going to use retrieval-augmented generation, i.e., break the documents into chunks and feed specific chunks into the model based on the query. With some tools and APIs, this may be handled for you (e.g., with OpenAI&rsquo;s Assistants). However, it introduces a source of inaccuracies that may not be acceptable for your use case.</p><h2 id=what-is-your-teaching-approach>What is your teaching approach?<a hidden class=anchor aria-hidden=true href=#what-is-your-teaching-approach>#</a></h2><p>Assuming you&rsquo;re satisfied that the AIs can see enough of the PDFs well enough to provide useful answers, it&rsquo;s time to test different ways of teaching them about the data extraction tasks. The question of which approaches you can test is closely tied to your budget. But even with large budgets, it&rsquo;s best to start simple and only attempt more complicated approaches if the simpler ones fail. In order of implementation complexity, key teaching approaches are:</p><ul><li><strong>Zero-shot:</strong> Describe the task and the expected output, provide a new PDF, and see if you get the expected output.</li><li><strong>Few-shot:</strong> In addition to describing the task and expected output, also provide examples of past PDFs and their extracted outputs. Given context window limitations, this is only feasible with relatively short PDFs and simple outputs.</li><li><strong>Fine-tuning:</strong> This goes beyond the sort of prompting that has become accessible to the general population via ChatGPT. The general idea is that you can get better results by teaching the underlying model about your expected inputs and outputs. Even if you don&rsquo;t have machine learning experts on your team, you may get good results by following resources such as <a href=https://platform.openai.com/docs/guides/fine-tuning/ target=_blank rel=noopener>the fine-tuning guide by OpenAI</a>. However, success isn&rsquo;t guaranteed, so it&rsquo;s important to manage expectations and budgets accordingly. It may well be the case that you&rsquo;re better off waiting a few months or years for new AI models, rather than investing in fine-tuning experimentation. New models are likely to make lower-effort zero/few-shot results better.</li><li><strong>Custom models:</strong> Taking a step beyond fine-tuning, building custom machine learning models may be a good match for your budget and available expertise. However, you definitely won&rsquo;t be doing it to automate a low-cost-low-frequency data entry process.</li></ul><p>Implicit in the above is the availability of some training & testing data (i.e., input PDFs and expected outputs). That is, no matter what approach you follow, you&rsquo;d want to have some confidence that it works beyond <a href=https://jxnl.github.io/blog/writing/2024/02/05/when-to-lgtm-at-k/ target=_blank rel=noopener>a few test samples</a> – use a large representative dataset to gain confidence in your solution.</p><h2 id=can-the-ai-model-understand-the-input-structures>Can the AI model understand the input structures?<a hidden class=anchor aria-hidden=true href=#can-the-ai-model-understand-the-input-structures>#</a></h2><p>This is closely related to the question of PDF complexity, but worth considering separately. I&rsquo;m anthropomorphising a bit by talking about AI <em>understanding</em>, but just as they can&rsquo;t see the same as we do, their level of understanding may also be unintuitive. For example, <a href=https://arxiv.org/pdf/2310.09263.pdf target=_blank rel=noopener>a recent paper</a> that proposed a fine-tuning approach to improve GPT-3.5&rsquo;s table understanding made the case that general language models can&rsquo;t read tables reliably because:</p><blockquote><p>Natural language texts are (1) one-directional, (2) read left-to-right, where (3) swapping two tokens will generally change the meaning of a sentence. In contrast, relational tables are (1) two-dimensional in nature with both rows and columns, (2) where reading top-to-bottom in the vertical direction for values in the same column, is crucial in many table-tasks. Furthermore, unlike text, (3) tables are largely &ldquo;invariant&rdquo; to row and column permutations, where swapping two rows or columns do not generally change the semantic meaning of the table.</p></blockquote><p>This argument is compelling, but given <a href="https://openreview.net/pdf?id=yzkSU5zdwD" target=_blank rel=noopener>the emergent abilities of large language models</a> and the fact that we no longer know what goes into proprietary models beyond GPT-3.5, I wouldn&rsquo;t bet on these limitations being an issue for <em>all</em> tabular data. Again, experimenting with your specific use case is key. If you encounter issues, it&rsquo;s worth probing the models to check if they exhibit any semantic understanding beyond just reproducing the inputs.</p><h2 id=can-the-ai-model-understand-and-produce-the-output-structures>Can the AI model understand and produce the output structures?<a hidden class=anchor aria-hidden=true href=#can-the-ai-model-understand-and-produce-the-output-structures>#</a></h2><p>If you&rsquo;re building custom models, it&rsquo;s straightforward to get exactly the output structure you want (e.g., a complex JSON). Otherwise, if you&rsquo;re prompting a language model, you need to ask <a href=https://arxiv.org/pdf/2402.14531.pdf target=_blank rel=noopener>nicely</a> and hope for the best. That said, there are strategies to get models to produce the output structures you want, such as using <a href=https://platform.openai.com/docs/guides/function-calling target=_blank rel=noopener>OpenAI&rsquo;s function calling</a> or <a href=https://github.com/outlines-dev/outlines target=_blank rel=noopener>the Outlines library</a>. However, as with the example of tabular inputs, there is a difference between being able to produce an output that conforms to a specific output schema and populating the schema with values that make semantic sense. Breaking down complex outputs to simpler structures and using <a href=https://www.promptingguide.ai/techniques/prompt_chaining target=_blank rel=noopener>prompt chaining</a> may be helpful in some cases.</p><h2 id=what-is-your-long-term-validation-approach>What is your long-term validation approach?<a hidden class=anchor aria-hidden=true href=#what-is-your-long-term-validation-approach>#</a></h2><p>Assuming you successfully build an AI solution that can replace manual data entry, should you completely stop manual data extraction? As with other questions, it depends on the use case, but it&rsquo;s worth considering a gradual switch to full automation. For example, you can keep the manual process for 10% of new data to verify that the whole system works as expected. This is especially worth doing when working with publicly-available datasets, as there&rsquo;s a non-zero chance that the models you&rsquo;re using have seen the input training data before (though they probably haven&rsquo;t seen your outputs).</p><h2 id=anything-else>Anything else?<a hidden class=anchor aria-hidden=true href=#anything-else>#</a></h2><p>If I missed important questions, please let me know and I will update this post.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Questions to consider when using AI for PDF data extraction on x" href="https://x.com/intent/tweet/?text=Questions%20to%20consider%20when%20using%20AI%20for%20PDF%20data%20extraction&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f03%2f11%2fquestions-to-consider-when-using-ai-for-pdf-data-extraction%2f&amp;hashtags=artificialintelligence%2cdatascience%2cmachinelearning%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Questions to consider when using AI for PDF data extraction on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f03%2f11%2fquestions-to-consider-when-using-ai-for-pdf-data-extraction%2f&amp;title=Questions%20to%20consider%20when%20using%20AI%20for%20PDF%20data%20extraction&amp;summary=Questions%20to%20consider%20when%20using%20AI%20for%20PDF%20data%20extraction&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f03%2f11%2fquestions-to-consider-when-using-ai-for-pdf-data-extraction%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Questions to consider when using AI for PDF data extraction on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f03%2f11%2fquestions-to-consider-when-using-ai-for-pdf-data-extraction%2f&title=Questions%20to%20consider%20when%20using%20AI%20for%20PDF%20data%20extraction"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Questions to consider when using AI for PDF data extraction on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f03%2f11%2fquestions-to-consider-when-using-ai-for-pdf-data-extraction%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Questions to consider when using AI for PDF data extraction on whatsapp" href="https://api.whatsapp.com/send?text=Questions%20to%20consider%20when%20using%20AI%20for%20PDF%20data%20extraction%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f03%2f11%2fquestions-to-consider-when-using-ai-for-pdf-data-extraction%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Questions to consider when using AI for PDF data extraction on telegram" href="https://telegram.me/share/url?text=Questions%20to%20consider%20when%20using%20AI%20for%20PDF%20data%20extraction&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f03%2f11%2fquestions-to-consider-when-using-ai-for-pdf-data-extraction%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Questions to consider when using AI for PDF data extraction on ycombinator" href="https://news.ycombinator.com/submitlink?t=Questions%20to%20consider%20when%20using%20AI%20for%20PDF%20data%20extraction&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f03%2f11%2fquestions-to-consider-when-using-ai-for-pdf-data-extraction%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/index.html b/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/index.html
index 078b3b153..3e6d82526 100644
--- a/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/index.html
+++ b/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Artificial intelligence, automation, and the art of counting fish | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,machine learning,marine science,Reef Life Survey"><meta name=description content="Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Artificial intelligence, automation, and the art of counting fish"><meta property="og:description" content="Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/"><meta property="og:image" content="https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-04-01T06:00:00+00:00"><meta property="article:modified_time" content="2024-04-01T17:02:44+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill.webp"><meta name=twitter:title content="Artificial intelligence, automation, and the art of counting fish"><meta name=twitter:description content="Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Artificial intelligence, automation, and the art of counting fish","item":"https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Artificial intelligence, automation, and the art of counting fish","name":"Artificial intelligence, automation, and the art of counting fish","description":"Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement.","keywords":["artificial intelligence","machine learning","marine science","Reef Life Survey"],"articleBody":"I recently returned from a Reef Life Survey (RLS) trip to Lord Howe Island. As a volunteer RLS diver, I record fish and invertebrates using methods that have remained unchanged for decades. Sites around Lord Howe Island have been consistently surveyed at least every two years since February 2006. This regularity and consistency in survey methods enables comparisons over time, which help inform marine park management decisions.\nAs a Data \u0026 AI specialist, I’ve wondered about the potential for automating RLS surveys since I first became a volunteer nearly ten years ago. Surely the data can be collected without using an underwater clipboard? Given advances in AI over the past decade, I believe that the answer is an emphatic yes – in principle.\nHowever, just because something can be automated doesn’t mean it happens instantaneously. As the saying goes, the future is here, but it is unevenly distributed. In fact, Lord Howe Island provides another example of this uneven distribution: As of 2024, the island still lacks mobile phone reception, despite the feasibility of providing coverage.\nSpecifically for RLS automation, even though computers can in principle identify, count, and size fish better than humans, there are several key challenges on the path to automation:\nConsistency: Much of the value of RLS data comes from employing the same methods over decades. Using new survey methods would mean introducing different biases, which would need to be modelled when comparing new data to pre-automation data. Underwater logistics and costs: Replacing volunteer divers with robots isn’t simple, as the RLS methodology goes beyond swimming along a transect line and counting fish. Divers also need to look under ledges and in crevices, move kelp, and inspect the undersides of shells. This is probably within the abilities of underwater robots, but it’d be hard for such robots to compete with volunteer divers on cost. Boating logistics and costs: Anyone who’s been to sea knows that boating isn’t straightforward. Conditions vary, and things break all the time. Getting humans out of the loop can make things more expensive, especially when there are many volunteers like me, who happily pay to go to sea and dive with or without RLS. Unlike humans, robots don’t dive for fun. Given the challenges, partial automation may be the way to go initially: Mount cameras on volunteers, collect many survey videos, and process them as a quality control. With enough work, the output of the processing should be similar to the survey output of the volunteers – or at a minimum, the biases will be better understood. This approach would enable modelling of biases in subsequent analyses, addressing the issue of consistency. At that point, it’d be possible to do away with the need for volunteers to record the data manually – they can just record videos of the dives. This would be an example of combining AI with citizen science to supercharge ecological monitoring.\nIf we accept partial automation as a desirable option, the key question becomes one of budget allocation. Collecting enough survey videos and doing the data and machine learning work isn’t a trivial exercise, i.e., it won’t be cheap. That said, I believe that with current technology the cost isn’t prohibitive either – it can probably be done as a PhD project with the right student and guidance. Still, whether the budget is best spent on partial automation or on other initiatives is an open question (that is not for me to decide).\nIn general, similar questions around budget allocation arise everywhere. While many of us in tech feel somewhat overwhelmed by the rate of progress in Data \u0026 AI, it is important to remember the uneven distribution of automation (treating AI as a synonym of automation makes it sound less magical). This uneven distribution means that opportunities abound for deploying proven technologies in various domains. These opportunities are unlikely to disappear overnight – they will be here for years to come. Focusing on solutions to real problems while maintaining an awareness of emerging tech (without chasing shiny objects) will remain the key to success. In particular, Data \u0026 AI engineering are likely to remain lucrative in coming years, though they will undoubtedly change with new tools and novel automation methods.\nAutomation or no automation, hanging out with the fish is always fun ","wordCount":"715","inLanguage":"en","image":"https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill.webp","datePublished":"2024-04-01T06:00:00Z","dateModified":"2024-04-01T17:02:44+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Artificial intelligence, automation, and the art of counting fish</h1><div class=post-meta><span title='2024-04-01 06:00:00 +0000 UTC'>April 1, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill_hu99b9caf4b3bdcfd5366477a1984c6c0c_76128_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill_hu99b9caf4b3bdcfd5366477a1984c6c0c_76128_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill_hu99b9caf4b3bdcfd5366477a1984c6c0c_76128_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill_hu99b9caf4b3bdcfd5366477a1984c6c0c_76128_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill.webp alt="View from Lord Howe Island's Intermediate Hill (showcasing many RLS sites)" width=1200 height=630><p>View from Lord Howe Island&rsquo;s Intermediate Hill (showcasing many RLS sites)</p></figure><div class=post-content><p>I recently returned from a <a href=https://reeflifesurvey.com/ target=_blank rel=noopener>Reef Life Survey (RLS)</a> trip to <a href=https://lordhoweisland.info/ target=_blank rel=noopener>Lord Howe Island</a>. As a volunteer RLS diver, I record fish and invertebrates using <a href=https://reeflifesurvey.com/methods/ target=_blank rel=noopener>methods</a> that have remained unchanged for decades. Sites around Lord Howe Island have been consistently surveyed at least every two years since February 2006. This regularity and consistency in survey methods enables comparisons over time, which help inform marine park management decisions.</p><p>As a Data & AI specialist, <a href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/>I&rsquo;ve wondered about the potential for automating RLS surveys since I first became a volunteer nearly ten years ago</a>. Surely the data can be collected without using an underwater clipboard? Given advances in AI over the past decade, I believe that the answer is an emphatic <em>yes</em> – in principle.</p><p>However, just because something can be automated doesn&rsquo;t mean it happens instantaneously. As the saying goes, <a href=https://quoteinvestigator.com/2012/01/24/future-has-arrived/ target=_blank rel=noopener>the future is here, but it is unevenly distributed</a>. In fact, Lord Howe Island provides another example of this uneven distribution: As of 2024, the island still lacks mobile phone reception, despite <a href=https://www.lhib.nsw.gov.au/sites/default/files/2023-09/Lord%20Howe%20Island%20Communications%20Options%20Paper_Respiro_250823_FINAL.pdf target=_blank rel=noopener>the feasibility of providing coverage</a>.</p><p>Specifically for RLS automation, even though computers can in principle identify, count, and size fish better than humans, there are several key challenges on the path to automation:</p><ul><li><strong>Consistency:</strong> Much of the value of RLS data comes from employing the same methods over decades. Using new survey methods would mean introducing different biases, which would need to be modelled when comparing new data to pre-automation data.</li><li><strong>Underwater logistics and costs:</strong> Replacing volunteer divers with robots isn&rsquo;t simple, as the RLS methodology goes beyond swimming along a transect line and counting fish. Divers also need to look under ledges and in crevices, move kelp, and inspect the undersides of shells. This is probably within the abilities of underwater robots, but it&rsquo;d be hard for such robots to compete with volunteer divers on cost.</li><li><strong>Boating logistics and costs:</strong> Anyone who&rsquo;s been to sea knows that boating isn&rsquo;t straightforward. Conditions vary, and things break all the time. Getting humans out of the loop can make things more expensive, especially when there are many volunteers like me, who happily pay to go to sea and dive with or without RLS. Unlike humans, robots don&rsquo;t dive for fun.</li></ul><p>Given the challenges, partial automation may be the way to go initially: Mount cameras on volunteers, collect many survey videos, and process them as a quality control. With enough work, the output of the processing should be similar to the survey output of the volunteers – or at a minimum, the biases will be better understood. This approach would enable modelling of biases in subsequent analyses, addressing the issue of consistency. At that point, it&rsquo;d be possible to do away with the need for volunteers to record the data manually – they can just record videos of the dives. This would be an example of <a href=https://www.sciencedirect.com/science/article/pii/S2666389920301434 target=_blank rel=noopener>combining AI with citizen science to supercharge ecological monitoring</a>.</p><p>If we accept partial automation as a desirable option, the key question becomes one of budget allocation. Collecting enough survey videos and doing the data and machine learning work isn&rsquo;t a trivial exercise, i.e., it won&rsquo;t be cheap. That said, I believe that with current technology the cost isn&rsquo;t prohibitive either – it can probably be done as a PhD project with the right student and guidance. Still, whether the budget is best spent on partial automation or on other initiatives is an open question (that is not for me to decide).</p><p>In general, similar questions around budget allocation arise everywhere. While many of us in tech feel somewhat overwhelmed by the rate of progress in Data & AI, it is important to remember the uneven distribution of automation (<a href=https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/>treating AI as a synonym of automation makes it sound less magical</a>). This uneven distribution means that opportunities abound for deploying proven technologies in various domains. These opportunities are unlikely to disappear overnight – they will be here for years to come. Focusing on solutions to real problems while maintaining an awareness of emerging tech (<a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>without chasing shiny objects</a>) will remain the key to success. In particular, Data & AI engineering are likely to remain lucrative in coming years, though they will undoubtedly change with <a href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/>new tools</a> and novel automation methods.</p><figure><a href=lord-howe-island-algal-hole-trevally.webp target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="artificial intelligence,machine learning,marine science,Reef Life Survey"><meta name=description content="Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Artificial intelligence, automation, and the art of counting fish"><meta property="og:description" content="Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/"><meta property="og:image" content="https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-04-01T06:00:00+00:00"><meta property="article:modified_time" content="2024-04-01T17:02:44+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill.webp"><meta name=twitter:title content="Artificial intelligence, automation, and the art of counting fish"><meta name=twitter:description content="Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Artificial intelligence, automation, and the art of counting fish","item":"https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Artificial intelligence, automation, and the art of counting fish","name":"Artificial intelligence, automation, and the art of counting fish","description":"Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement.","keywords":["artificial intelligence","machine learning","marine science","Reef Life Survey"],"articleBody":"I recently returned from a Reef Life Survey (RLS) trip to Lord Howe Island. As a volunteer RLS diver, I record fish and invertebrates using methods that have remained unchanged for decades. Sites around Lord Howe Island have been consistently surveyed at least every two years since February 2006. This regularity and consistency in survey methods enables comparisons over time, which help inform marine park management decisions.\nAs a Data \u0026 AI specialist, I’ve wondered about the potential for automating RLS surveys since I first became a volunteer nearly ten years ago. Surely the data can be collected without using an underwater clipboard? Given advances in AI over the past decade, I believe that the answer is an emphatic yes – in principle.\nHowever, just because something can be automated doesn’t mean it happens instantaneously. As the saying goes, the future is here, but it is unevenly distributed. In fact, Lord Howe Island provides another example of this uneven distribution: As of 2024, the island still lacks mobile phone reception, despite the feasibility of providing coverage.\nSpecifically for RLS automation, even though computers can in principle identify, count, and size fish better than humans, there are several key challenges on the path to automation:\nConsistency: Much of the value of RLS data comes from employing the same methods over decades. Using new survey methods would mean introducing different biases, which would need to be modelled when comparing new data to pre-automation data. Underwater logistics and costs: Replacing volunteer divers with robots isn’t simple, as the RLS methodology goes beyond swimming along a transect line and counting fish. Divers also need to look under ledges and in crevices, move kelp, and inspect the undersides of shells. This is probably within the abilities of underwater robots, but it’d be hard for such robots to compete with volunteer divers on cost. Boating logistics and costs: Anyone who’s been to sea knows that boating isn’t straightforward. Conditions vary, and things break all the time. Getting humans out of the loop can make things more expensive, especially when there are many volunteers like me, who happily pay to go to sea and dive with or without RLS. Unlike humans, robots don’t dive for fun. Given the challenges, partial automation may be the way to go initially: Mount cameras on volunteers, collect many survey videos, and process them as a quality control. With enough work, the output of the processing should be similar to the survey output of the volunteers – or at a minimum, the biases will be better understood. This approach would enable modelling of biases in subsequent analyses, addressing the issue of consistency. At that point, it’d be possible to do away with the need for volunteers to record the data manually – they can just record videos of the dives. This would be an example of combining AI with citizen science to supercharge ecological monitoring.\nIf we accept partial automation as a desirable option, the key question becomes one of budget allocation. Collecting enough survey videos and doing the data and machine learning work isn’t a trivial exercise, i.e., it won’t be cheap. That said, I believe that with current technology the cost isn’t prohibitive either – it can probably be done as a PhD project with the right student and guidance. Still, whether the budget is best spent on partial automation or on other initiatives is an open question (that is not for me to decide).\nIn general, similar questions around budget allocation arise everywhere. While many of us in tech feel somewhat overwhelmed by the rate of progress in Data \u0026 AI, it is important to remember the uneven distribution of automation (treating AI as a synonym of automation makes it sound less magical). This uneven distribution means that opportunities abound for deploying proven technologies in various domains. These opportunities are unlikely to disappear overnight – they will be here for years to come. Focusing on solutions to real problems while maintaining an awareness of emerging tech (without chasing shiny objects) will remain the key to success. In particular, Data \u0026 AI engineering are likely to remain lucrative in coming years, though they will undoubtedly change with new tools and novel automation methods.\nAutomation or no automation, hanging out with the fish is always fun ","wordCount":"715","inLanguage":"en","image":"https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill.webp","datePublished":"2024-04-01T06:00:00Z","dateModified":"2024-04-01T17:02:44+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Artificial intelligence, automation, and the art of counting fish</h1><div class=post-meta><span title='2024-04-01 06:00:00 +0000 UTC'>April 1, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill_hu99b9caf4b3bdcfd5366477a1984c6c0c_76128_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill_hu99b9caf4b3bdcfd5366477a1984c6c0c_76128_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill_hu99b9caf4b3bdcfd5366477a1984c6c0c_76128_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill_hu99b9caf4b3bdcfd5366477a1984c6c0c_76128_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-view-from-intermediate-hill.webp alt="View from Lord Howe Island's Intermediate Hill (showcasing many RLS sites)" width=1200 height=630><p>View from Lord Howe Island&rsquo;s Intermediate Hill (showcasing many RLS sites)</p></figure><div class=post-content><p>I recently returned from a <a href=https://reeflifesurvey.com/ target=_blank rel=noopener>Reef Life Survey (RLS)</a> trip to <a href=https://lordhoweisland.info/ target=_blank rel=noopener>Lord Howe Island</a>. As a volunteer RLS diver, I record fish and invertebrates using <a href=https://reeflifesurvey.com/methods/ target=_blank rel=noopener>methods</a> that have remained unchanged for decades. Sites around Lord Howe Island have been consistently surveyed at least every two years since February 2006. This regularity and consistency in survey methods enables comparisons over time, which help inform marine park management decisions.</p><p>As a Data & AI specialist, <a href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/>I&rsquo;ve wondered about the potential for automating RLS surveys since I first became a volunteer nearly ten years ago</a>. Surely the data can be collected without using an underwater clipboard? Given advances in AI over the past decade, I believe that the answer is an emphatic <em>yes</em> – in principle.</p><p>However, just because something can be automated doesn&rsquo;t mean it happens instantaneously. As the saying goes, <a href=https://quoteinvestigator.com/2012/01/24/future-has-arrived/ target=_blank rel=noopener>the future is here, but it is unevenly distributed</a>. In fact, Lord Howe Island provides another example of this uneven distribution: As of 2024, the island still lacks mobile phone reception, despite <a href=https://www.lhib.nsw.gov.au/sites/default/files/2023-09/Lord%20Howe%20Island%20Communications%20Options%20Paper_Respiro_250823_FINAL.pdf target=_blank rel=noopener>the feasibility of providing coverage</a>.</p><p>Specifically for RLS automation, even though computers can in principle identify, count, and size fish better than humans, there are several key challenges on the path to automation:</p><ul><li><strong>Consistency:</strong> Much of the value of RLS data comes from employing the same methods over decades. Using new survey methods would mean introducing different biases, which would need to be modelled when comparing new data to pre-automation data.</li><li><strong>Underwater logistics and costs:</strong> Replacing volunteer divers with robots isn&rsquo;t simple, as the RLS methodology goes beyond swimming along a transect line and counting fish. Divers also need to look under ledges and in crevices, move kelp, and inspect the undersides of shells. This is probably within the abilities of underwater robots, but it&rsquo;d be hard for such robots to compete with volunteer divers on cost.</li><li><strong>Boating logistics and costs:</strong> Anyone who&rsquo;s been to sea knows that boating isn&rsquo;t straightforward. Conditions vary, and things break all the time. Getting humans out of the loop can make things more expensive, especially when there are many volunteers like me, who happily pay to go to sea and dive with or without RLS. Unlike humans, robots don&rsquo;t dive for fun.</li></ul><p>Given the challenges, partial automation may be the way to go initially: Mount cameras on volunteers, collect many survey videos, and process them as a quality control. With enough work, the output of the processing should be similar to the survey output of the volunteers – or at a minimum, the biases will be better understood. This approach would enable modelling of biases in subsequent analyses, addressing the issue of consistency. At that point, it&rsquo;d be possible to do away with the need for volunteers to record the data manually – they can just record videos of the dives. This would be an example of <a href=https://www.sciencedirect.com/science/article/pii/S2666389920301434 target=_blank rel=noopener>combining AI with citizen science to supercharge ecological monitoring</a>.</p><p>If we accept partial automation as a desirable option, the key question becomes one of budget allocation. Collecting enough survey videos and doing the data and machine learning work isn&rsquo;t a trivial exercise, i.e., it won&rsquo;t be cheap. That said, I believe that with current technology the cost isn&rsquo;t prohibitive either – it can probably be done as a PhD project with the right student and guidance. Still, whether the budget is best spent on partial automation or on other initiatives is an open question (that is not for me to decide).</p><p>In general, similar questions around budget allocation arise everywhere. While many of us in tech feel somewhat overwhelmed by the rate of progress in Data & AI, it is important to remember the uneven distribution of automation (<a href=https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/>treating AI as a synonym of automation makes it sound less magical</a>). This uneven distribution means that opportunities abound for deploying proven technologies in various domains. These opportunities are unlikely to disappear overnight – they will be here for years to come. Focusing on solutions to real problems while maintaining an awareness of emerging tech (<a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>without chasing shiny objects</a>) will remain the key to success. In particular, Data & AI engineering are likely to remain lucrative in coming years, though they will undoubtedly change with <a href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/>new tools</a> and novel automation methods.</p><figure><a href=lord-howe-island-algal-hole-trevally.webp target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-algal-hole-trevally_hub04ed3d37235d405b5867447ff565938_331694_360x0_resize_q75_h2_box_2.webp 360w,
 https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-algal-hole-trevally_hub04ed3d37235d405b5867447ff565938_331694_480x0_resize_q75_h2_box_2.webp 480w,
 https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/lord-howe-island-algal-hole-trevally_hub04ed3d37235d405b5867447ff565938_331694_720x0_resize_q75_h2_box_2.webp 720w,
diff --git a/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/index.html b/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/index.html
index 766a3b645..60728a498 100644
--- a/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/index.html
+++ b/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>My experience as a Data Tech Lead with Work on Climate | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="career,climate change,data engineering,data science,data strategy,environment,personal,remote work,startups"><meta name=description content="The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="My experience as a Data Tech Lead with Work on Climate"><meta property="og:description" content="The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/"><meta property="og:image" content="https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-04-08T02:00:00+00:00"><meta property="article:modified_time" content="2024-04-08T12:13:47+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover.webp"><meta name=twitter:title content="My experience as a Data Tech Lead with Work on Climate"><meta name=twitter:description content="The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"My experience as a Data Tech Lead with Work on Climate","item":"https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"My experience as a Data Tech Lead with Work on Climate","name":"My experience as a Data Tech Lead with Work on Climate","description":"The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting \u0026amp; fractional work.","keywords":["career","climate change","data engineering","data science","data strategy","environment","personal","remote work","startups"],"articleBody":"After leaving my last startup gig last year, I gave myself time to explore. My plan was to build my own product business while doing a bit of consulting on the side. As I was researching the climate tech space, I attended Work on Climate’s Expert Office Hours to get feedback on one of my ideas. The session was informative – it helped me avoid the common trap of techies, and not build something I would find hard to sell.\nTo my surprise, I received an email from a user researcher with Work on Climate shortly after the session. They wanted to interview me about my experience booking and attending the Expert Office Hours. This was a level of professionalism I wasn’t expecting from an organisation that’s mostly run by volunteers!\nFollowing that experience, I poked around the Work on Climate website and saw they were looking for a volunteer Data Engineer on the Metrics \u0026 Data Team. Despite being reluctant to do data engineering work full time, the specified time commitment of five hours per week for six months seemed manageable. I like maintaining awareness of what’s happening in the data engineering world, and this seemed like a productive way of doing it: supporting climate tech work and giving back to an organisation that has helped me. Further, I thought it’d be fun to join a global team of highly-skilled volunteers while figuring out my solo ventures.\nApplying and joining the team My impression of Work on Climate as a professional organisation was reinforced throughout the application and onboarding process:\nDespite being primarily run by volunteers, positions are advertised on a Work with Us page with descriptions that are similar to those of full-time jobs. In my application, I had to address key criteria: share why I wanted to volunteer, explain how I’d address common challenges with the role (listed in the position description), and describe a side project that demonstrates my volunteer experience. Following the submission of the application form, I had screening calls with a recruiter (Sarah Fowler), the team lead (Xanthe Travlos), and a data engineer on the team (Misha Panshenskov). In general, this was much like a “normal” job application process, but condensed. If you look through the LinkedIn profiles above, you’ll see why: Work on Climate volunteers have extensive professional experience in the same fields they contribute to as part of the organisation.\nThe theme of volunteering being like a normal job but condensed extended throughout my onboarding and initial work. I went through the usual experience of getting access to systems, becoming familiar with the team and problems, and picking up my first introductory issues – just as I would if this were a full-time job. In addition to being condensed, another big difference to a full-time job was the passage of time: As a volunteer workweek is only five hours, any project that spans a few workdays becomes weeks in calendar time. This requires being more mindful of delivering incremental value than you would in a full-time environment: Challenging, but doable.\nBecoming a tech lead One of the challenges listed in the job ad was:\nWe are a start-up in our data maturity—we are still figuring out how to use data and how best to model and transform it to meet wider team needs. Some things we do might not make sense to you—we’ll welcome your suggestions on how to improve things!\nOther challenges were along the same lines, so my response to the application question about addressing them was:\nThe challenges listed sound like many organisations I’ve worked with, especially having a distributed time-constrained team with a data stack that grows organically and isn’t very well-documented. I don’t think there’s a silver bullet to address those challenges other than patiently getting up to speed and working with others to figure out the top priorities. It also makes sense to add documentation as part of my on-boarding. I suspect that prioritisation and planning given the volunteer time constraints will be key, but even when working with full-timers it’s often the case that there’s more work than time.\nIndeed, that was pretty much what happened. While my original intent was to act as an individual contributor, I find it hard to hold back when I see ways to improve processes and systems. As promised in the ad, my suggestions were well-received, but beyond what I had expected: Xanthe suggested we become co-leads of the Metrics \u0026 Data team. This made sense given that our backgrounds are complementary – she has extensive product management experience, while I have more hands-on exposure to the data world. Therefore, we settled into a new team structure, with her retaining the team lead position and business-facing / product management responsibilities, and me setting the technical direction for the data platform.\nThis pattern matches what had happened before in some of my full-time roles. For example, in my work with Automattic, I became the tech lead for the experimentation platform after organically spotting the need for better experimentation processes when working on machine learning for marketing applications. With Automattic, as with Work on Climate, working as a tech lead alongside a capable product manager and team lead allowed us both to capitalise on our strengths.\nA brief example to illustrate the need for a data tech lead: As promised in the job ad, one of the challenges with Work on Climate is that the data stack has evolved “organically”. This included data coming from scheduled Jupyter notebooks, various APIs (via a JavaScript codebase), and some transformations with dbt. As the notebooks weren’t well-maintained, it made sense to absorb them into other parts of the stack and reduce the maintenance load. As a tech lead, I help spot and prioritise such work, moving Work on Climate towards a minimum viable data stack that serves the organisation’s needs.\nChallenges and opportunities Much has been happening concurrently to my work with Work on Climate. My exploration of product ideas that I could bootstrap by myself led me to the realisation that the lines between solo consulting and product building are blurry. Borrowing from Jonathan Stark, my business strategy has turned into helping people I like get what they want, which seems easier to achieve as a solo consultant than by building a software product. While concise positioning remains a challenge, I started taking my consulting practice more seriously rather than seeing as it a side gig. My goal is to help climate tech and nature-positive startups with the sort of problems I’ve been helping Work on Climate and various companies throughout my career, i.e., with shipping Data \u0026 AI solutions. However, like many solo consultants, I’ve discovered that generating a pipeline of qualified leads is a key challenge – harder than the technical aspects of my work.\nOne unexpected challenge with my climate focus has been the October 7th attacks on Israel. While I’ve been living in Australia since 2009, I am a Jew from Israel, so I’ve been deeply affected by October 7th and its aftermath. Beyond the horror of the massacres, I was horrified by the response of some politicians and activists who claim to be “green”. Fortunately, I haven’t witnessed such responses within Work on Climate, where the internal reaction was of support and understanding of the human suffering caused by wars. However, these events have led me to revise the criteria of “people I like” and want to help in the climate space. I definitely dislike those who promote Jew hatred or support calls for the destruction of Israel (which would result in the death of my family). Fortunately, this doesn’t exclude everyone, as there’s a fair number of Jews and generally-decent people who are focused on building climate solutions – truly hateful people are a loud minority.\nAnyway, by the beginning of 2024 I managed to find a new balance and a professional direction as an independent consultant. Further, volunteering with Work on Climate has highlighted an opportunity for consulting engagements: I learned that it is possible to provide value as a data tech lead even with five hours per week. The trendy name for this is a fractional chief data/analytics/AI officer. My ideal clients for such engagements are startups around the stage of getting their first data hire, with a similar level of data maturity to Work on Climate.\nThat said, Work on Climate has some unique challenges and opportunities that don’t show up in startups with similar data maturity. On the one hand, given the large number of volunteers, keeping everyone in sync and breaking down silos is harder than with a smaller group of employees. On the other hand, the cost of “hiring” volunteers is in recruitment and onboarding rather than in ongoing salaries, which gives the organisation access to “free” fractional talent that most startups can only dream of. In any case, it is an interesting organisation to volunteer with if you come in with the right mindset. From a logistical viewpoint, it’s good fit if you know you’ll have about five hours per week for at least six months, keeping in mind that the hours may be spread throughout the week (e.g., for calls and discussions with other volunteers).\nFuture moves My original intention was to help Work on Climate for at least six months. I am now about nine months in. With my newfound consulting focus, I find that I have quite a bit to juggle outside Work of Climate. While I would love to do it all, I have less of the mental space to contribute to the organisation. Therefore, we are looking for someone to replace me in the coming months. However, I will likely still contribute on a more limited advisory basis. If you are interested in volunteering with Work on Climate, or know someone who would be a suitable data tech lead, please get in touch!\nThis milestone award is another example of Work on Climate’s similarity to full-time work environments. It’s nice to be recognised! ","wordCount":"1669","inLanguage":"en","image":"https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover.webp","datePublished":"2024-04-08T02:00:00Z","dateModified":"2024-04-08T12:13:47+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">My experience as a Data Tech Lead with Work on Climate</h1><div class=post-meta><span title='2024-04-08 02:00:00 +0000 UTC'>April 8, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover_hu7587861194b1824e1870ec434bee0974_81004_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover_hu7587861194b1824e1870ec434bee0974_81004_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover_hu7587861194b1824e1870ec434bee0974_81004_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover_hu7587861194b1824e1870ec434bee0974_81004_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover.webp alt width=1200 height=630></figure><div class=post-content><p>After leaving <a href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/>my last startup gig</a> last year, I gave myself time to explore. My plan was to build my own product business while doing a bit of consulting on the side. As I was researching the climate tech space, I attended <a href=https://workonclimate.org/expert-office-hours/ target=_blank rel=noopener>Work on Climate&rsquo;s Expert Office Hours</a> to get feedback on one of my ideas. The session was informative – it helped me avoid the common trap of techies, and <em>not</em> build something I would find hard to sell.</p><p>To my surprise, I received an email from a user researcher with Work on Climate shortly after the session. They wanted to interview me about my experience booking and attending the Expert Office Hours. This was a level of professionalism I wasn&rsquo;t expecting from an organisation that&rsquo;s mostly run by volunteers!</p><p>Following that experience, I poked around the Work on Climate website and saw they were looking for a volunteer Data Engineer on the Metrics & Data Team. Despite being <a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>reluctant to do data engineering work full time</a>, the specified time commitment of five hours per week for six months seemed manageable. I like maintaining awareness of what&rsquo;s happening in the data engineering world, and this seemed like a productive way of doing it: supporting climate tech work and giving back to an organisation that has helped me. Further, I thought it&rsquo;d be fun to join a global team of highly-skilled volunteers while figuring out my solo ventures.</p><h2 id=applying-and-joining-the-team>Applying and joining the team<a hidden class=anchor aria-hidden=true href=#applying-and-joining-the-team>#</a></h2><p>My impression of Work on Climate as a professional organisation was reinforced throughout the application and onboarding process:</p><ul><li>Despite being primarily run by volunteers, positions are advertised on a <a href=https://workonclimate.org/careers/ target=_blank rel=noopener>Work with Us</a> page with descriptions that are similar to those of full-time jobs.</li><li>In my application, I had to address key criteria: share why I wanted to volunteer, explain how I&rsquo;d address common challenges with the role (listed in the position description), and describe a side project that demonstrates my volunteer experience.</li><li>Following the submission of the application form, I had screening calls with a recruiter (<a href=https://www.linkedin.com/in/sarah-fowler-wa/ target=_blank rel=noopener>Sarah Fowler</a>), the team lead (<a href=https://www.linkedin.com/in/xanthetravlos/ target=_blank rel=noopener>Xanthe Travlos</a>), and a data engineer on the team (<a href=https://www.linkedin.com/in/mikpan/ target=_blank rel=noopener>Misha Panshenskov</a>).</li></ul><p>In general, this was much like a &ldquo;normal&rdquo; job application process, but condensed. If you look through the LinkedIn profiles above, you&rsquo;ll see why: Work on Climate volunteers have extensive professional experience in the same fields they contribute to as part of the organisation.</p><p>The theme of volunteering being like a normal job but condensed extended throughout my onboarding and initial work. I went through the usual experience of getting access to systems, becoming familiar with the team and problems, and picking up my first introductory issues – just as I would if this were a full-time job. In addition to being condensed, another big difference to a full-time job was the passage of time: As a volunteer workweek is only five hours, any project that spans a few workdays becomes weeks in calendar time. This requires being more mindful of delivering incremental value than you would in a full-time environment: Challenging, but doable.</p><h2 id=becoming-a-tech-lead>Becoming a tech lead<a hidden class=anchor aria-hidden=true href=#becoming-a-tech-lead>#</a></h2><p>One of the challenges listed in the job ad was:</p><blockquote><p>We are a start-up in our data maturity—we are still figuring out how to use data and how best to model and transform it to meet wider team needs. Some things we do might not make sense to you—we’ll welcome your suggestions on how to improve things!</p></blockquote><p>Other challenges were along the same lines, so my response to the application question about addressing them was:</p><blockquote><p>The challenges listed sound like many organisations I&rsquo;ve worked with, especially having a distributed time-constrained team with a data stack that grows organically and isn&rsquo;t very well-documented. I don&rsquo;t think there&rsquo;s a silver bullet to address those challenges other than patiently getting up to speed and working with others to figure out the top priorities. It also makes sense to add documentation as part of my on-boarding. I suspect that prioritisation and planning given the volunteer time constraints will be key, but even when working with full-timers it&rsquo;s often the case that there&rsquo;s more work than time.</p></blockquote><p>Indeed, that was pretty much what happened. While my original intent was to act as an individual contributor, I find it hard to hold back when I see ways to improve processes and systems. As promised in the ad, my suggestions were well-received, but beyond what I had expected: Xanthe suggested we become co-leads of the Metrics & Data team. This made sense given that our backgrounds are complementary – she has extensive product management experience, while I have more hands-on exposure to the data world. Therefore, we settled into a new team structure, with her retaining the team lead position and business-facing / product management responsibilities, and me setting the technical direction for the data platform.</p><p>This pattern matches what had happened before in some of my full-time roles. For example, in <a href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/>my work with Automattic</a>, I became the tech lead for the experimentation platform after organically spotting the need for better experimentation processes when working on machine learning for marketing applications. With Automattic, as with Work on Climate, working as a tech lead alongside a capable product manager and team lead allowed us both to capitalise on our strengths.</p><p>A brief example to illustrate the need for a data tech lead: As promised in the job ad, one of the challenges with Work on Climate is that the data stack has evolved &ldquo;organically&rdquo;. This included data coming from scheduled Jupyter notebooks, various APIs (via a JavaScript codebase), and some transformations with <a href=https://www.getdbt.com/product/what-is-dbt target=_blank rel=noopener>dbt</a>. As the notebooks weren&rsquo;t well-maintained, it made sense to absorb them into other parts of the stack and reduce the maintenance load. As a tech lead, I help spot and prioritise such work, moving Work on Climate towards <a href=https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/>a minimum viable data stack</a> that serves the organisation&rsquo;s needs.</p><h2 id=challenges-and-opportunities>Challenges and opportunities<a hidden class=anchor aria-hidden=true href=#challenges-and-opportunities>#</a></h2><p>Much has been happening concurrently to my work with Work on Climate. My exploration of product ideas that I could bootstrap by myself led me to the realisation that <a href=https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/>the lines between solo consulting and product building are blurry</a>. Borrowing from <a href=https://jonathanstark.com/daily/20200504-1409-the-only-business-strategy-youll-ever-need target=_blank rel=noopener>Jonathan Stark</a>, my business strategy has turned into <em>helping people I like get what they want</em>, which seems easier to achieve as a solo consultant than by building a software product. While <a href=https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/>concise positioning remains a challenge</a>, I started taking my consulting practice more seriously rather than seeing as it a side gig. My goal is to help climate tech and nature-positive startups with the sort of problems I&rsquo;ve been helping Work on Climate and various companies throughout my career, i.e., with shipping Data & AI solutions. However, like many solo consultants, I&rsquo;ve discovered that generating a pipeline of qualified leads is a key challenge – harder than the technical aspects of my work.</p><p>One unexpected challenge with my climate focus has been <a href=https://en.wikipedia.org/wiki/2023_Hamas-led_attack_on_Israel target=_blank rel=noopener>the October 7th attacks on Israel</a>. While I&rsquo;ve been living in Australia since 2009, I am a Jew from Israel, so I&rsquo;ve been deeply affected by October 7th and its aftermath. Beyond the horror of the massacres, <a href=https://www.linkedin.com/feed/update/urn:li:activity:7118365271970443264/ target=_blank rel=noopener>I was horrified by the response of some politicians and activists who claim to be &ldquo;green&rdquo;</a>. Fortunately, I haven&rsquo;t witnessed such responses within Work on Climate, where the internal reaction was of support and understanding of <a href=https://edition.cnn.com/2024/04/05/opinions/israel-gaza-6-months-october-7-ghitis/index.html target=_blank rel=noopener>the human suffering caused by wars</a>. However, these events have led me to revise the criteria of <em>&ldquo;people I like&rdquo;</em> and want to help in the climate space. I definitely dislike those who promote <a href=https://www.australianjewishnews.com/dead-jews-and-live-antisemites/ target=_blank rel=noopener>Jew hatred</a> or support <a href=https://www.linkedin.com/posts/yanirseroussi_just-a-reminder-that-from-the-river-to-the-activity-7125612754731724801-1gUZ/ target=_blank rel=noopener>calls for the destruction of Israel</a> (which would result in the death of my family). Fortunately, this doesn&rsquo;t exclude everyone, as there&rsquo;s a fair number of Jews and generally-decent people who are focused on building climate solutions – truly hateful people are a loud minority.</p><p>Anyway, by the beginning of 2024 I managed to find a new balance and a professional direction as an independent consultant. Further, volunteering with Work on Climate has highlighted an opportunity for consulting engagements: I learned that it is possible to provide value as a data tech lead even with five hours per week. The trendy name for this is <a href=https://www.fractionaldefined.com/ target=_blank rel=noopener>a fractional chief data/analytics/AI officer</a>. My ideal clients for such engagements are startups around the stage of <a href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/>getting their first data hire</a>, with a similar level of data maturity to Work on Climate.</p><p>That said, Work on Climate has some unique challenges and opportunities that don&rsquo;t show up in startups with similar data maturity. On the one hand, given the large number of volunteers, keeping everyone in sync and breaking down silos is harder than with a smaller group of employees. On the other hand, the cost of &ldquo;hiring&rdquo; volunteers is in recruitment and onboarding rather than in ongoing salaries, which gives the organisation access to &ldquo;free&rdquo; fractional talent that most startups can only dream of. In any case, it is an interesting organisation to volunteer with if you come in with the right mindset. From a logistical viewpoint, it&rsquo;s good fit if you know you&rsquo;ll have about five hours per week for at least six months, keeping in mind that the hours may be spread throughout the week (e.g., for calls and discussions with other volunteers).</p><h2 id=future-moves>Future moves<a hidden class=anchor aria-hidden=true href=#future-moves>#</a></h2><p>My original intention was to help Work on Climate for at least six months. I am now about nine months in. With my newfound consulting focus, I find that I have quite a bit to juggle outside Work of Climate. While I would love to do it all, I have less of the mental space to contribute to the organisation. Therefore, we are looking for someone to replace me in the coming months. However, I will likely still contribute on a more limited advisory basis. If you are interested in volunteering with Work on Climate, or know someone who would be a suitable data tech lead, please get in touch!</p><figure><a href=work-on-climate-six-month-milestone-award.webp target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content="career,climate change,data engineering,data science,data strategy,environment,personal,remote work,startups"><meta name=description content="The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="My experience as a Data Tech Lead with Work on Climate"><meta property="og:description" content="The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/"><meta property="og:image" content="https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-04-08T02:00:00+00:00"><meta property="article:modified_time" content="2024-04-08T12:13:47+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover.webp"><meta name=twitter:title content="My experience as a Data Tech Lead with Work on Climate"><meta name=twitter:description content="The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"My experience as a Data Tech Lead with Work on Climate","item":"https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"My experience as a Data Tech Lead with Work on Climate","name":"My experience as a Data Tech Lead with Work on Climate","description":"The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting \u0026amp; fractional work.","keywords":["career","climate change","data engineering","data science","data strategy","environment","personal","remote work","startups"],"articleBody":"After leaving my last startup gig last year, I gave myself time to explore. My plan was to build my own product business while doing a bit of consulting on the side. As I was researching the climate tech space, I attended Work on Climate’s Expert Office Hours to get feedback on one of my ideas. The session was informative – it helped me avoid the common trap of techies, and not build something I would find hard to sell.\nTo my surprise, I received an email from a user researcher with Work on Climate shortly after the session. They wanted to interview me about my experience booking and attending the Expert Office Hours. This was a level of professionalism I wasn’t expecting from an organisation that’s mostly run by volunteers!\nFollowing that experience, I poked around the Work on Climate website and saw they were looking for a volunteer Data Engineer on the Metrics \u0026 Data Team. Despite being reluctant to do data engineering work full time, the specified time commitment of five hours per week for six months seemed manageable. I like maintaining awareness of what’s happening in the data engineering world, and this seemed like a productive way of doing it: supporting climate tech work and giving back to an organisation that has helped me. Further, I thought it’d be fun to join a global team of highly-skilled volunteers while figuring out my solo ventures.\nApplying and joining the team My impression of Work on Climate as a professional organisation was reinforced throughout the application and onboarding process:\nDespite being primarily run by volunteers, positions are advertised on a Work with Us page with descriptions that are similar to those of full-time jobs. In my application, I had to address key criteria: share why I wanted to volunteer, explain how I’d address common challenges with the role (listed in the position description), and describe a side project that demonstrates my volunteer experience. Following the submission of the application form, I had screening calls with a recruiter (Sarah Fowler), the team lead (Xanthe Travlos), and a data engineer on the team (Misha Panshenskov). In general, this was much like a “normal” job application process, but condensed. If you look through the LinkedIn profiles above, you’ll see why: Work on Climate volunteers have extensive professional experience in the same fields they contribute to as part of the organisation.\nThe theme of volunteering being like a normal job but condensed extended throughout my onboarding and initial work. I went through the usual experience of getting access to systems, becoming familiar with the team and problems, and picking up my first introductory issues – just as I would if this were a full-time job. In addition to being condensed, another big difference to a full-time job was the passage of time: As a volunteer workweek is only five hours, any project that spans a few workdays becomes weeks in calendar time. This requires being more mindful of delivering incremental value than you would in a full-time environment: Challenging, but doable.\nBecoming a tech lead One of the challenges listed in the job ad was:\nWe are a start-up in our data maturity—we are still figuring out how to use data and how best to model and transform it to meet wider team needs. Some things we do might not make sense to you—we’ll welcome your suggestions on how to improve things!\nOther challenges were along the same lines, so my response to the application question about addressing them was:\nThe challenges listed sound like many organisations I’ve worked with, especially having a distributed time-constrained team with a data stack that grows organically and isn’t very well-documented. I don’t think there’s a silver bullet to address those challenges other than patiently getting up to speed and working with others to figure out the top priorities. It also makes sense to add documentation as part of my on-boarding. I suspect that prioritisation and planning given the volunteer time constraints will be key, but even when working with full-timers it’s often the case that there’s more work than time.\nIndeed, that was pretty much what happened. While my original intent was to act as an individual contributor, I find it hard to hold back when I see ways to improve processes and systems. As promised in the ad, my suggestions were well-received, but beyond what I had expected: Xanthe suggested we become co-leads of the Metrics \u0026 Data team. This made sense given that our backgrounds are complementary – she has extensive product management experience, while I have more hands-on exposure to the data world. Therefore, we settled into a new team structure, with her retaining the team lead position and business-facing / product management responsibilities, and me setting the technical direction for the data platform.\nThis pattern matches what had happened before in some of my full-time roles. For example, in my work with Automattic, I became the tech lead for the experimentation platform after organically spotting the need for better experimentation processes when working on machine learning for marketing applications. With Automattic, as with Work on Climate, working as a tech lead alongside a capable product manager and team lead allowed us both to capitalise on our strengths.\nA brief example to illustrate the need for a data tech lead: As promised in the job ad, one of the challenges with Work on Climate is that the data stack has evolved “organically”. This included data coming from scheduled Jupyter notebooks, various APIs (via a JavaScript codebase), and some transformations with dbt. As the notebooks weren’t well-maintained, it made sense to absorb them into other parts of the stack and reduce the maintenance load. As a tech lead, I help spot and prioritise such work, moving Work on Climate towards a minimum viable data stack that serves the organisation’s needs.\nChallenges and opportunities Much has been happening concurrently to my work with Work on Climate. My exploration of product ideas that I could bootstrap by myself led me to the realisation that the lines between solo consulting and product building are blurry. Borrowing from Jonathan Stark, my business strategy has turned into helping people I like get what they want, which seems easier to achieve as a solo consultant than by building a software product. While concise positioning remains a challenge, I started taking my consulting practice more seriously rather than seeing as it a side gig. My goal is to help climate tech and nature-positive startups with the sort of problems I’ve been helping Work on Climate and various companies throughout my career, i.e., with shipping Data \u0026 AI solutions. However, like many solo consultants, I’ve discovered that generating a pipeline of qualified leads is a key challenge – harder than the technical aspects of my work.\nOne unexpected challenge with my climate focus has been the October 7th attacks on Israel. While I’ve been living in Australia since 2009, I am a Jew from Israel, so I’ve been deeply affected by October 7th and its aftermath. Beyond the horror of the massacres, I was horrified by the response of some politicians and activists who claim to be “green”. Fortunately, I haven’t witnessed such responses within Work on Climate, where the internal reaction was of support and understanding of the human suffering caused by wars. However, these events have led me to revise the criteria of “people I like” and want to help in the climate space. I definitely dislike those who promote Jew hatred or support calls for the destruction of Israel (which would result in the death of my family). Fortunately, this doesn’t exclude everyone, as there’s a fair number of Jews and generally-decent people who are focused on building climate solutions – truly hateful people are a loud minority.\nAnyway, by the beginning of 2024 I managed to find a new balance and a professional direction as an independent consultant. Further, volunteering with Work on Climate has highlighted an opportunity for consulting engagements: I learned that it is possible to provide value as a data tech lead even with five hours per week. The trendy name for this is a fractional chief data/analytics/AI officer. My ideal clients for such engagements are startups around the stage of getting their first data hire, with a similar level of data maturity to Work on Climate.\nThat said, Work on Climate has some unique challenges and opportunities that don’t show up in startups with similar data maturity. On the one hand, given the large number of volunteers, keeping everyone in sync and breaking down silos is harder than with a smaller group of employees. On the other hand, the cost of “hiring” volunteers is in recruitment and onboarding rather than in ongoing salaries, which gives the organisation access to “free” fractional talent that most startups can only dream of. In any case, it is an interesting organisation to volunteer with if you come in with the right mindset. From a logistical viewpoint, it’s good fit if you know you’ll have about five hours per week for at least six months, keeping in mind that the hours may be spread throughout the week (e.g., for calls and discussions with other volunteers).\nFuture moves My original intention was to help Work on Climate for at least six months. I am now about nine months in. With my newfound consulting focus, I find that I have quite a bit to juggle outside Work of Climate. While I would love to do it all, I have less of the mental space to contribute to the organisation. Therefore, we are looking for someone to replace me in the coming months. However, I will likely still contribute on a more limited advisory basis. If you are interested in volunteering with Work on Climate, or know someone who would be a suitable data tech lead, please get in touch!\nThis milestone award is another example of Work on Climate’s similarity to full-time work environments. It’s nice to be recognised! ","wordCount":"1669","inLanguage":"en","image":"https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover.webp","datePublished":"2024-04-08T02:00:00Z","dateModified":"2024-04-08T12:13:47+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">My experience as a Data Tech Lead with Work on Climate</h1><div class=post-meta><span title='2024-04-08 02:00:00 +0000 UTC'>April 8, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover_hu7587861194b1824e1870ec434bee0974_81004_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover_hu7587861194b1824e1870ec434bee0974_81004_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover_hu7587861194b1824e1870ec434bee0974_81004_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover_hu7587861194b1824e1870ec434bee0974_81004_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/cover.webp alt width=1200 height=630></figure><div class=post-content><p>After leaving <a href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/>my last startup gig</a> last year, I gave myself time to explore. My plan was to build my own product business while doing a bit of consulting on the side. As I was researching the climate tech space, I attended <a href=https://workonclimate.org/expert-office-hours/ target=_blank rel=noopener>Work on Climate&rsquo;s Expert Office Hours</a> to get feedback on one of my ideas. The session was informative – it helped me avoid the common trap of techies, and <em>not</em> build something I would find hard to sell.</p><p>To my surprise, I received an email from a user researcher with Work on Climate shortly after the session. They wanted to interview me about my experience booking and attending the Expert Office Hours. This was a level of professionalism I wasn&rsquo;t expecting from an organisation that&rsquo;s mostly run by volunteers!</p><p>Following that experience, I poked around the Work on Climate website and saw they were looking for a volunteer Data Engineer on the Metrics & Data Team. Despite being <a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>reluctant to do data engineering work full time</a>, the specified time commitment of five hours per week for six months seemed manageable. I like maintaining awareness of what&rsquo;s happening in the data engineering world, and this seemed like a productive way of doing it: supporting climate tech work and giving back to an organisation that has helped me. Further, I thought it&rsquo;d be fun to join a global team of highly-skilled volunteers while figuring out my solo ventures.</p><h2 id=applying-and-joining-the-team>Applying and joining the team<a hidden class=anchor aria-hidden=true href=#applying-and-joining-the-team>#</a></h2><p>My impression of Work on Climate as a professional organisation was reinforced throughout the application and onboarding process:</p><ul><li>Despite being primarily run by volunteers, positions are advertised on a <a href=https://workonclimate.org/careers/ target=_blank rel=noopener>Work with Us</a> page with descriptions that are similar to those of full-time jobs.</li><li>In my application, I had to address key criteria: share why I wanted to volunteer, explain how I&rsquo;d address common challenges with the role (listed in the position description), and describe a side project that demonstrates my volunteer experience.</li><li>Following the submission of the application form, I had screening calls with a recruiter (<a href=https://www.linkedin.com/in/sarah-fowler-wa/ target=_blank rel=noopener>Sarah Fowler</a>), the team lead (<a href=https://www.linkedin.com/in/xanthetravlos/ target=_blank rel=noopener>Xanthe Travlos</a>), and a data engineer on the team (<a href=https://www.linkedin.com/in/mikpan/ target=_blank rel=noopener>Misha Panshenskov</a>).</li></ul><p>In general, this was much like a &ldquo;normal&rdquo; job application process, but condensed. If you look through the LinkedIn profiles above, you&rsquo;ll see why: Work on Climate volunteers have extensive professional experience in the same fields they contribute to as part of the organisation.</p><p>The theme of volunteering being like a normal job but condensed extended throughout my onboarding and initial work. I went through the usual experience of getting access to systems, becoming familiar with the team and problems, and picking up my first introductory issues – just as I would if this were a full-time job. In addition to being condensed, another big difference to a full-time job was the passage of time: As a volunteer workweek is only five hours, any project that spans a few workdays becomes weeks in calendar time. This requires being more mindful of delivering incremental value than you would in a full-time environment: Challenging, but doable.</p><h2 id=becoming-a-tech-lead>Becoming a tech lead<a hidden class=anchor aria-hidden=true href=#becoming-a-tech-lead>#</a></h2><p>One of the challenges listed in the job ad was:</p><blockquote><p>We are a start-up in our data maturity—we are still figuring out how to use data and how best to model and transform it to meet wider team needs. Some things we do might not make sense to you—we’ll welcome your suggestions on how to improve things!</p></blockquote><p>Other challenges were along the same lines, so my response to the application question about addressing them was:</p><blockquote><p>The challenges listed sound like many organisations I&rsquo;ve worked with, especially having a distributed time-constrained team with a data stack that grows organically and isn&rsquo;t very well-documented. I don&rsquo;t think there&rsquo;s a silver bullet to address those challenges other than patiently getting up to speed and working with others to figure out the top priorities. It also makes sense to add documentation as part of my on-boarding. I suspect that prioritisation and planning given the volunteer time constraints will be key, but even when working with full-timers it&rsquo;s often the case that there&rsquo;s more work than time.</p></blockquote><p>Indeed, that was pretty much what happened. While my original intent was to act as an individual contributor, I find it hard to hold back when I see ways to improve processes and systems. As promised in the ad, my suggestions were well-received, but beyond what I had expected: Xanthe suggested we become co-leads of the Metrics & Data team. This made sense given that our backgrounds are complementary – she has extensive product management experience, while I have more hands-on exposure to the data world. Therefore, we settled into a new team structure, with her retaining the team lead position and business-facing / product management responsibilities, and me setting the technical direction for the data platform.</p><p>This pattern matches what had happened before in some of my full-time roles. For example, in <a href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/>my work with Automattic</a>, I became the tech lead for the experimentation platform after organically spotting the need for better experimentation processes when working on machine learning for marketing applications. With Automattic, as with Work on Climate, working as a tech lead alongside a capable product manager and team lead allowed us both to capitalise on our strengths.</p><p>A brief example to illustrate the need for a data tech lead: As promised in the job ad, one of the challenges with Work on Climate is that the data stack has evolved &ldquo;organically&rdquo;. This included data coming from scheduled Jupyter notebooks, various APIs (via a JavaScript codebase), and some transformations with <a href=https://www.getdbt.com/product/what-is-dbt target=_blank rel=noopener>dbt</a>. As the notebooks weren&rsquo;t well-maintained, it made sense to absorb them into other parts of the stack and reduce the maintenance load. As a tech lead, I help spot and prioritise such work, moving Work on Climate towards <a href=https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/>a minimum viable data stack</a> that serves the organisation&rsquo;s needs.</p><h2 id=challenges-and-opportunities>Challenges and opportunities<a hidden class=anchor aria-hidden=true href=#challenges-and-opportunities>#</a></h2><p>Much has been happening concurrently to my work with Work on Climate. My exploration of product ideas that I could bootstrap by myself led me to the realisation that <a href=https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/>the lines between solo consulting and product building are blurry</a>. Borrowing from <a href=https://jonathanstark.com/daily/20200504-1409-the-only-business-strategy-youll-ever-need target=_blank rel=noopener>Jonathan Stark</a>, my business strategy has turned into <em>helping people I like get what they want</em>, which seems easier to achieve as a solo consultant than by building a software product. While <a href=https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/>concise positioning remains a challenge</a>, I started taking my consulting practice more seriously rather than seeing as it a side gig. My goal is to help climate tech and nature-positive startups with the sort of problems I&rsquo;ve been helping Work on Climate and various companies throughout my career, i.e., with shipping Data & AI solutions. However, like many solo consultants, I&rsquo;ve discovered that generating a pipeline of qualified leads is a key challenge – harder than the technical aspects of my work.</p><p>One unexpected challenge with my climate focus has been <a href=https://en.wikipedia.org/wiki/2023_Hamas-led_attack_on_Israel target=_blank rel=noopener>the October 7th attacks on Israel</a>. While I&rsquo;ve been living in Australia since 2009, I am a Jew from Israel, so I&rsquo;ve been deeply affected by October 7th and its aftermath. Beyond the horror of the massacres, <a href=https://www.linkedin.com/feed/update/urn:li:activity:7118365271970443264/ target=_blank rel=noopener>I was horrified by the response of some politicians and activists who claim to be &ldquo;green&rdquo;</a>. Fortunately, I haven&rsquo;t witnessed such responses within Work on Climate, where the internal reaction was of support and understanding of <a href=https://edition.cnn.com/2024/04/05/opinions/israel-gaza-6-months-october-7-ghitis/index.html target=_blank rel=noopener>the human suffering caused by wars</a>. However, these events have led me to revise the criteria of <em>&ldquo;people I like&rdquo;</em> and want to help in the climate space. I definitely dislike those who promote <a href=https://www.australianjewishnews.com/dead-jews-and-live-antisemites/ target=_blank rel=noopener>Jew hatred</a> or support <a href=https://www.linkedin.com/posts/yanirseroussi_just-a-reminder-that-from-the-river-to-the-activity-7125612754731724801-1gUZ/ target=_blank rel=noopener>calls for the destruction of Israel</a> (which would result in the death of my family). Fortunately, this doesn&rsquo;t exclude everyone, as there&rsquo;s a fair number of Jews and generally-decent people who are focused on building climate solutions – truly hateful people are a loud minority.</p><p>Anyway, by the beginning of 2024 I managed to find a new balance and a professional direction as an independent consultant. Further, volunteering with Work on Climate has highlighted an opportunity for consulting engagements: I learned that it is possible to provide value as a data tech lead even with five hours per week. The trendy name for this is <a href=https://www.fractionaldefined.com/ target=_blank rel=noopener>a fractional chief data/analytics/AI officer</a>. My ideal clients for such engagements are startups around the stage of <a href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/>getting their first data hire</a>, with a similar level of data maturity to Work on Climate.</p><p>That said, Work on Climate has some unique challenges and opportunities that don&rsquo;t show up in startups with similar data maturity. On the one hand, given the large number of volunteers, keeping everyone in sync and breaking down silos is harder than with a smaller group of employees. On the other hand, the cost of &ldquo;hiring&rdquo; volunteers is in recruitment and onboarding rather than in ongoing salaries, which gives the organisation access to &ldquo;free&rdquo; fractional talent that most startups can only dream of. In any case, it is an interesting organisation to volunteer with if you come in with the right mindset. From a logistical viewpoint, it&rsquo;s good fit if you know you&rsquo;ll have about five hours per week for at least six months, keeping in mind that the hours may be spread throughout the week (e.g., for calls and discussions with other volunteers).</p><h2 id=future-moves>Future moves<a hidden class=anchor aria-hidden=true href=#future-moves>#</a></h2><p>My original intention was to help Work on Climate for at least six months. I am now about nine months in. With my newfound consulting focus, I find that I have quite a bit to juggle outside Work of Climate. While I would love to do it all, I have less of the mental space to contribute to the organisation. Therefore, we are looking for someone to replace me in the coming months. However, I will likely still contribute on a more limited advisory basis. If you are interested in volunteering with Work on Climate, or know someone who would be a suitable data tech lead, please get in touch!</p><figure><a href=work-on-climate-six-month-milestone-award.webp target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/work-on-climate-six-month-milestone-award_hu8ed3990140315acddae2a6c763f03c66_143390_360x0_resize_q75_h2_box_2.webp 360w,
 https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/work-on-climate-six-month-milestone-award_hu8ed3990140315acddae2a6c763f03c66_143390_480x0_resize_q75_h2_box_2.webp 480w,
 https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/work-on-climate-six-month-milestone-award_hu8ed3990140315acddae2a6c763f03c66_143390_720x0_resize_q75_h2_box_2.webp 720w,
diff --git a/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/index.html b/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/index.html
index 5c6914c61..d97828694 100644
--- a/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/index.html
+++ b/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>AI does not obviate the need for testing and observability | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,machine learning,software engineering"><meta name=description content="It&rsquo;s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="AI does not obviate the need for testing and observability"><meta property="og:description" content="It&rsquo;s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/"><meta property="og:image" content="https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-04-15T05:00:00+00:00"><meta property="article:modified_time" content="2024-04-15T15:54:17+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover.webp"><meta name=twitter:title content="AI does not obviate the need for testing and observability"><meta name=twitter:description content="It&rsquo;s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"AI does not obviate the need for testing and observability","item":"https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"AI does not obviate the need for testing and observability","name":"AI does not obviate the need for testing and observability","description":"It\u0026rsquo;s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software.","keywords":["artificial intelligence","machine learning","software engineering"],"articleBody":"The excitement sparked by ChatGPT has led to a flood of funding for building AI applications, especially around large language models (LLMs). The ease of getting started with AI can lead to excessive enthusiasm, to the point of believing that we have entered a new regime of software development where old best practices no longer apply. The goal of this post is to demonstrate that we are still in the old regime: Testing and observability remain key to AI success beyond initial prototypes.\nBookmark and reuse if anyone tries to claim otherwise.\nFirst, let’s acknowledge the fact that prototyping AI applications is now easier than ever. For example, I recently watched this video by Hrishi Olickel, which demonstrates how to go from zero to a working AI-powered app in about thirty minutes. Examples like this abound, but I have a feeling that people might miss two key messages from the video:\n99% of the time, the problem is with your data. The app isn’t ready for production. Two elements that solid production-level apps include are testing and observability. This is highlighted in recent posts by two consultants who are helping companies ship LLM-powered applications:\nYour AI Product Needs Evals by Hamel Husain. Key quote: “Unsuccessful products almost always share a common root cause: a failure to create robust evaluation systems.” Levels of Complexity: RAG Applications by Jason Liu. Level 3 is observability. Level 4 is evaluations. The use of the word evaluations (or evals) by both authors is intentional. This is the common term for testing that deals with the challenges of working with LLMs (essentially a complex mapping from any text input to any text output). As noted in the OpenAI Evals repository:\nIf you are building with LLMs, creating high quality evals is one of the most impactful things you can do. Without evals, it can be very difficult and time intensive to understand how different model versions might affect your use case.\nThat is, we are at the opposite to a new regime where traditional software testing can be forgotten: Production-level AI apps still require all the usual software tests, as well as AI-specific evaluations.\nIn a way, this is nothing new. Before ChatGPT drew significant attention to LLMs, much of the buzz was around traditional machine learning (ML) apps. And many of the best practices from ML engineering apply to LLM / AI engineering.\nIf you are inexperienced with shipping production-grade AI/ML/LLM applications, please don’t let it stop you from prototyping. But if you are getting serious about going beyond a prototype, it’s time to either get help from experienced AI engineers, or to become one yourself (experience is a great teacher). Just remember that there is no way around testing and observability if you want to ship a quality product.\n","wordCount":"465","inLanguage":"en","image":"https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover.webp","datePublished":"2024-04-15T05:00:00Z","dateModified":"2024-04-15T15:54:17+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">AI does not obviate the need for testing and observability</h1><div class=post-meta><span title='2024-04-15 05:00:00 +0000 UTC'>April 15, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover_hu086bc2bb87f4a14886c80fee86cdc272_99778_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover_hu086bc2bb87f4a14886c80fee86cdc272_99778_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover_hu086bc2bb87f4a14886c80fee86cdc272_99778_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover_hu086bc2bb87f4a14886c80fee86cdc272_99778_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover.webp alt="clunky untested bot on the left, slicker bot on the right" width=1200 height=630></figure><div class=post-content><p>The excitement sparked by ChatGPT has led to a flood of funding for building AI applications, especially around large language models (LLMs). The ease of getting started with AI can lead to excessive enthusiasm, to the point of believing that we have entered a new regime of software development where old best practices no longer apply. The goal of this post is to demonstrate that we are still in the old regime: <strong>Testing and observability remain key to AI success beyond initial prototypes.</strong></p><p>Bookmark and reuse if anyone tries to claim otherwise.</p><p>First, let&rsquo;s acknowledge the fact that <strong>prototyping AI applications is now easier than ever.</strong> For example, I recently watched <a href="https://www.youtube.com/watch?v=8w0hUcQSDy8" target=_blank rel=noopener>this video by Hrishi Olickel</a>, which demonstrates how to go from zero to a working AI-powered app in about thirty minutes. Examples like this abound, but I have a feeling that people might miss two key messages from the video:</p><ol><li>99% of the time, the problem is with your data.</li><li>The app isn&rsquo;t ready for production.</li></ol><p><strong>Two elements that solid production-level apps include are testing and observability.</strong> This is highlighted in recent posts by two consultants who are helping companies ship LLM-powered applications:</p><ol><li><a href=https://hamel.dev/blog/posts/evals/ target=_blank rel=noopener>Your AI Product Needs Evals</a> by Hamel Husain. Key quote: <em>&ldquo;Unsuccessful products almost always share a common root cause: a failure to create robust evaluation systems.&rdquo;</em></li><li><a href=https://jxnl.github.io/blog/writing/2024/02/28/levels-of-complexity-rag-applications/ target=_blank rel=noopener>Levels of Complexity: RAG Applications</a> by Jason Liu. Level 3 is observability. Level 4 is evaluations.</li></ol><p>The use of the word <em>evaluations</em> (or <em>evals</em>) by both authors is intentional. This is the common term for testing that deals with the challenges of working with LLMs (essentially a complex mapping from any text input to any text output). As noted in <a href=https://github.com/openai/evals target=_blank rel=noopener>the OpenAI Evals repository</a>:</p><blockquote><p>If you are building with LLMs, creating high quality evals is one of the most impactful things you can do. Without evals, it can be very difficult and time intensive to understand how different model versions might affect your use case.</p></blockquote><p>That is, we are at the opposite to a new regime where traditional software testing can be forgotten: <strong>Production-level AI apps still require all the usual software tests, <em>as well as</em> AI-specific evaluations.</strong></p><p>In a way, this is nothing new. Before ChatGPT drew significant attention to LLMs, much of the buzz was around traditional machine learning (ML) apps. And <a href=https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/>many of the best practices from ML engineering apply to LLM / AI engineering</a>.</p><p>If you are inexperienced with shipping production-grade AI/ML/LLM applications, please don&rsquo;t let it stop you from prototyping. But if you are getting serious about going beyond a prototype, it&rsquo;s time to either get help from experienced AI engineers, or to become one yourself (experience is a great teacher). Just remember that <strong>there is no way around testing and observability if you want to ship a quality product.</strong></p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share AI does not obviate the need for testing and observability on x" href="https://x.com/intent/tweet/?text=AI%20does%20not%20obviate%20the%20need%20for%20testing%20and%20observability&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f15%2fai-does-not-obviate-the-need-for-testing-and-observability%2f&amp;hashtags=artificialintelligence%2cmachinelearning%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI does not obviate the need for testing and observability on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f15%2fai-does-not-obviate-the-need-for-testing-and-observability%2f&amp;title=AI%20does%20not%20obviate%20the%20need%20for%20testing%20and%20observability&amp;summary=AI%20does%20not%20obviate%20the%20need%20for%20testing%20and%20observability&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f15%2fai-does-not-obviate-the-need-for-testing-and-observability%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI does not obviate the need for testing and observability on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f15%2fai-does-not-obviate-the-need-for-testing-and-observability%2f&title=AI%20does%20not%20obviate%20the%20need%20for%20testing%20and%20observability"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI does not obviate the need for testing and observability on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f15%2fai-does-not-obviate-the-need-for-testing-and-observability%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI does not obviate the need for testing and observability on whatsapp" href="https://api.whatsapp.com/send?text=AI%20does%20not%20obviate%20the%20need%20for%20testing%20and%20observability%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f15%2fai-does-not-obviate-the-need-for-testing-and-observability%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI does not obviate the need for testing and observability on telegram" href="https://telegram.me/share/url?text=AI%20does%20not%20obviate%20the%20need%20for%20testing%20and%20observability&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f15%2fai-does-not-obviate-the-need-for-testing-and-observability%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI does not obviate the need for testing and observability on ycombinator" href="https://news.ycombinator.com/submitlink?t=AI%20does%20not%20obviate%20the%20need%20for%20testing%20and%20observability&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f15%2fai-does-not-obviate-the-need-for-testing-and-observability%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="artificial intelligence,machine learning,software engineering"><meta name=description content="It&rsquo;s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="AI does not obviate the need for testing and observability"><meta property="og:description" content="It&rsquo;s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/"><meta property="og:image" content="https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-04-15T05:00:00+00:00"><meta property="article:modified_time" content="2024-04-15T15:54:17+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover.webp"><meta name=twitter:title content="AI does not obviate the need for testing and observability"><meta name=twitter:description content="It&rsquo;s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"AI does not obviate the need for testing and observability","item":"https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"AI does not obviate the need for testing and observability","name":"AI does not obviate the need for testing and observability","description":"It\u0026rsquo;s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software.","keywords":["artificial intelligence","machine learning","software engineering"],"articleBody":"The excitement sparked by ChatGPT has led to a flood of funding for building AI applications, especially around large language models (LLMs). The ease of getting started with AI can lead to excessive enthusiasm, to the point of believing that we have entered a new regime of software development where old best practices no longer apply. The goal of this post is to demonstrate that we are still in the old regime: Testing and observability remain key to AI success beyond initial prototypes.\nBookmark and reuse if anyone tries to claim otherwise.\nFirst, let’s acknowledge the fact that prototyping AI applications is now easier than ever. For example, I recently watched this video by Hrishi Olickel, which demonstrates how to go from zero to a working AI-powered app in about thirty minutes. Examples like this abound, but I have a feeling that people might miss two key messages from the video:\n99% of the time, the problem is with your data. The app isn’t ready for production. Two elements that solid production-level apps include are testing and observability. This is highlighted in recent posts by two consultants who are helping companies ship LLM-powered applications:\nYour AI Product Needs Evals by Hamel Husain. Key quote: “Unsuccessful products almost always share a common root cause: a failure to create robust evaluation systems.” Levels of Complexity: RAG Applications by Jason Liu. Level 3 is observability. Level 4 is evaluations. The use of the word evaluations (or evals) by both authors is intentional. This is the common term for testing that deals with the challenges of working with LLMs (essentially a complex mapping from any text input to any text output). As noted in the OpenAI Evals repository:\nIf you are building with LLMs, creating high quality evals is one of the most impactful things you can do. Without evals, it can be very difficult and time intensive to understand how different model versions might affect your use case.\nThat is, we are at the opposite to a new regime where traditional software testing can be forgotten: Production-level AI apps still require all the usual software tests, as well as AI-specific evaluations.\nIn a way, this is nothing new. Before ChatGPT drew significant attention to LLMs, much of the buzz was around traditional machine learning (ML) apps. And many of the best practices from ML engineering apply to LLM / AI engineering.\nIf you are inexperienced with shipping production-grade AI/ML/LLM applications, please don’t let it stop you from prototyping. But if you are getting serious about going beyond a prototype, it’s time to either get help from experienced AI engineers, or to become one yourself (experience is a great teacher). Just remember that there is no way around testing and observability if you want to ship a quality product.\n","wordCount":"465","inLanguage":"en","image":"https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover.webp","datePublished":"2024-04-15T05:00:00Z","dateModified":"2024-04-15T15:54:17+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">AI does not obviate the need for testing and observability</h1><div class=post-meta><span title='2024-04-15 05:00:00 +0000 UTC'>April 15, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover_hu086bc2bb87f4a14886c80fee86cdc272_99778_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover_hu086bc2bb87f4a14886c80fee86cdc272_99778_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover_hu086bc2bb87f4a14886c80fee86cdc272_99778_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover_hu086bc2bb87f4a14886c80fee86cdc272_99778_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/cover.webp alt="clunky untested bot on the left, slicker bot on the right" width=1200 height=630></figure><div class=post-content><p>The excitement sparked by ChatGPT has led to a flood of funding for building AI applications, especially around large language models (LLMs). The ease of getting started with AI can lead to excessive enthusiasm, to the point of believing that we have entered a new regime of software development where old best practices no longer apply. The goal of this post is to demonstrate that we are still in the old regime: <strong>Testing and observability remain key to AI success beyond initial prototypes.</strong></p><p>Bookmark and reuse if anyone tries to claim otherwise.</p><p>First, let&rsquo;s acknowledge the fact that <strong>prototyping AI applications is now easier than ever.</strong> For example, I recently watched <a href="https://www.youtube.com/watch?v=8w0hUcQSDy8" target=_blank rel=noopener>this video by Hrishi Olickel</a>, which demonstrates how to go from zero to a working AI-powered app in about thirty minutes. Examples like this abound, but I have a feeling that people might miss two key messages from the video:</p><ol><li>99% of the time, the problem is with your data.</li><li>The app isn&rsquo;t ready for production.</li></ol><p><strong>Two elements that solid production-level apps include are testing and observability.</strong> This is highlighted in recent posts by two consultants who are helping companies ship LLM-powered applications:</p><ol><li><a href=https://hamel.dev/blog/posts/evals/ target=_blank rel=noopener>Your AI Product Needs Evals</a> by Hamel Husain. Key quote: <em>&ldquo;Unsuccessful products almost always share a common root cause: a failure to create robust evaluation systems.&rdquo;</em></li><li><a href=https://jxnl.github.io/blog/writing/2024/02/28/levels-of-complexity-rag-applications/ target=_blank rel=noopener>Levels of Complexity: RAG Applications</a> by Jason Liu. Level 3 is observability. Level 4 is evaluations.</li></ol><p>The use of the word <em>evaluations</em> (or <em>evals</em>) by both authors is intentional. This is the common term for testing that deals with the challenges of working with LLMs (essentially a complex mapping from any text input to any text output). As noted in <a href=https://github.com/openai/evals target=_blank rel=noopener>the OpenAI Evals repository</a>:</p><blockquote><p>If you are building with LLMs, creating high quality evals is one of the most impactful things you can do. Without evals, it can be very difficult and time intensive to understand how different model versions might affect your use case.</p></blockquote><p>That is, we are at the opposite to a new regime where traditional software testing can be forgotten: <strong>Production-level AI apps still require all the usual software tests, <em>as well as</em> AI-specific evaluations.</strong></p><p>In a way, this is nothing new. Before ChatGPT drew significant attention to LLMs, much of the buzz was around traditional machine learning (ML) apps. And <a href=https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/>many of the best practices from ML engineering apply to LLM / AI engineering</a>.</p><p>If you are inexperienced with shipping production-grade AI/ML/LLM applications, please don&rsquo;t let it stop you from prototyping. But if you are getting serious about going beyond a prototype, it&rsquo;s time to either get help from experienced AI engineers, or to become one yourself (experience is a great teacher). Just remember that <strong>there is no way around testing and observability if you want to ship a quality product.</strong></p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share AI does not obviate the need for testing and observability on x" href="https://x.com/intent/tweet/?text=AI%20does%20not%20obviate%20the%20need%20for%20testing%20and%20observability&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f15%2fai-does-not-obviate-the-need-for-testing-and-observability%2f&amp;hashtags=artificialintelligence%2cmachinelearning%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI does not obviate the need for testing and observability on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f15%2fai-does-not-obviate-the-need-for-testing-and-observability%2f&amp;title=AI%20does%20not%20obviate%20the%20need%20for%20testing%20and%20observability&amp;summary=AI%20does%20not%20obviate%20the%20need%20for%20testing%20and%20observability&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f15%2fai-does-not-obviate-the-need-for-testing-and-observability%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI does not obviate the need for testing and observability on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f15%2fai-does-not-obviate-the-need-for-testing-and-observability%2f&title=AI%20does%20not%20obviate%20the%20need%20for%20testing%20and%20observability"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI does not obviate the need for testing and observability on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f15%2fai-does-not-obviate-the-need-for-testing-and-observability%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI does not obviate the need for testing and observability on whatsapp" href="https://api.whatsapp.com/send?text=AI%20does%20not%20obviate%20the%20need%20for%20testing%20and%20observability%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f15%2fai-does-not-obviate-the-need-for-testing-and-observability%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI does not obviate the need for testing and observability on telegram" href="https://telegram.me/share/url?text=AI%20does%20not%20obviate%20the%20need%20for%20testing%20and%20observability&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f15%2fai-does-not-obviate-the-need-for-testing-and-observability%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI does not obviate the need for testing and observability on ycombinator" href="https://news.ycombinator.com/submitlink?t=AI%20does%20not%20obviate%20the%20need%20for%20testing%20and%20observability&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f15%2fai-does-not-obviate-the-need-for-testing-and-observability%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2024/04/22/assessing-a-startups-data-to-ai-health/index.html b/2024/04/22/assessing-a-startups-data-to-ai-health/index.html
index 5e2d89663..2150bd22d 100644
--- a/2024/04/22/assessing-a-startups-data-to-ai-health/index.html
+++ b/2024/04/22/assessing-a-startups-data-to-ai-health/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Assessing a startup's data-to-AI health | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="analytics,artificial intelligence,business,data science,data strategy,machine learning,software engineering,startups"><meta name=description content="Reviewing the areas that should be assessed to determine a startup&rsquo;s opportunities and challenges on the data/AI/ML front."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Assessing a startup's data-to-AI health"><meta property="og:description" content="Reviewing the areas that should be assessed to determine a startup&rsquo;s opportunities and challenges on the data/AI/ML front."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/"><meta property="og:image" content="https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-04-22T06:00:00+00:00"><meta property="article:modified_time" content="2024-04-22T17:38:21+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover.webp"><meta name=twitter:title content="Assessing a startup's data-to-AI health"><meta name=twitter:description content="Reviewing the areas that should be assessed to determine a startup&rsquo;s opportunities and challenges on the data/AI/ML front."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Assessing a startup's data-to-AI health","item":"https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Assessing a startup's data-to-AI health","name":"Assessing a startup\u0027s data-to-AI health","description":"Reviewing the areas that should be assessed to determine a startup\u0026rsquo;s opportunities and challenges on the data/AI/ML front.","keywords":["analytics","artificial intelligence","business","data science","data strategy","machine learning","software engineering","startups"],"articleBody":"In the past year, I went from exploring product ideas to committing to my current consulting practice. One thing that became apparent was that I needed to get better at communicating my unique value proposition: who I serve, and how I can help them. The circus that is data (/AI/ML/BI/analytics/…) titles and terminology definitely doesn’t help. Sprinkle a couple of decades of hype cycles on top, and you end up where we are today: a mess of inflated expectations followed by disappointments. But also a wealth of opportunities to generate genuine value.\nAnyway, I’m now fairly clear on whom I’m actively targeting: funded Australian startups (around seed to series A) in the climate \u0026 nature tech space, who can use help on their data-to-AI journey. I started calling it data-to-AI rather than data \u0026 AI or data/AI/ML because anything AI (/ML/data science/…) or analytics starts with data – and keeps going back to data.\nHow I can help isn’t as clearly communicated as I’d like it to be, so I’ve been working on that. Offerings at the cheap and expensive ends of the spectrum are easy to explain: one-off advisory calls include bespoke on-the-spot advice, while fractional chief data \u0026 AI officer engagements include similar responsibilities to those of a full-timer with the same title. However, it’s in nobody’s best interest to jump straight into a fractional relationship. To address this, I’ve been working on a standard offering that’d be more structured than advisory calls, deliver value to the client, and allow both parties to uncover opportunities and see how we work together.\nMy working title for the offering is Data-to-AI Health Check (better suggestions welcome). The idea is to assess where the startup stands with their data/AI/ML stack and capabilities, and identify the top opportunities for improvement.\nThis has been on my mind for a while, so I’ve collected a heap of documents and questions for inspiration. I’m now at the “too overwhelmed” phase of turning it into something I can present, but hopefully I’ll have it all sorted in the coming weeks.\nIn the meantime (and in the spirit of building in public), the rest of this post describes the areas I think are most important to assess. Suggestions for areas I might have missed are welcome. In future posts, I’ll add more detail on performing the assessment, which will undoubtedly evolve as I offer it to more clients.\nAssessment areas Product and business model. Understanding what the startup is about and where it’s going is key to understanding where data/AI/ML fit in. One useful lens is determining whether the product is ML-centric or non-ML, with non-ML products varying in their data intensity from data-centric to data-supported. It’s also important to understand key metrics and how they’re measured.\nPeople. Who’s working for the company and what is the team structure? In particular, what are the current data/AI/ML capabilities and experience? Can the current staff deliver what the business needs? If there are skill gaps (e.g., they haven’t yet made their first data hire), what’s the plan to address them? Can the current team adequately assess the skills of data people?\nProcesses and project management. The best people will fail to deliver projects if the company’s processes have deep flaws. My general opinion is that all the best practices from software development can and should be applied to data projects (e.g., see posts from 2023 and 2018). However, data entropy and the probabilistic nature of AI/ML require extra care and practices in addition to traditional software development.\nCulture. Knowing what people are on the team and what processes are in place isn’t enough to assess how well the team can deliver the product vision. Culture – the unwritten norms and beliefs of the company – matters. A lot. For example, if the founder doesn’t tolerate data-backed evidence that contradicts their preconceived notions, it’s likely to be an impediment to data/AI/ML project delivery. Similarly, it’s worth paying attention to how experiments are treated: If a hypothesis behind an experiment turns out to be unsupported, it’s not a failure. Failing to learn from experiments is the true failure.\nData. What data is the company dealing with? What are the data’s volume, velocity, and variety? Is all the necessary data being captured? How clean is it? Where is it stored and how is it processed? What data management practices are in place, both explicitly and implicitly?\nTech. Closely related to data is the tech architecture, systems, and software. Tech includes where the data lives and how it flows, particularly how it feeds into AI/ML/analytics applications. Of particular interest is the allocation of innovation tokens. Innovation tokens should be spent on tech that makes the startup meaningfully unique to its customers. Everything else should be boring and standard, i.e., proven to work and fit for purpose.\nSecurity and compliance. Security is interwoven through all of the above. For example, you want a culture where any person can flag security risks – some of which may only be visible if you’re close to the code and data. Security breaches and data leaks can destroy companies, especially young startups that haven’t earned customer trust yet. Particular attention should be paid to compliance issues that arise with data collection, e.g., around personal and regulated data.\nOther opportunities and risks. In exploring the above areas, issues that don’t fit neatly into any bucket are likely to be uncovered. These may be new opportunities or risks. It’s important to keep an eye out for such cases and flag them accordingly.\nClosing thoughts In my experience, it’s easy to find a thousand areas for improvement once you become familiar with a startup or a large company division. It’s harder to identify the top three items to work on next – it is a bet on the highest-impact items that are feasible to deliver.\nIt is also a challenge to distill the Data-to-AI Health Check to a set of questions that would probe the right areas without burdening the startup too much. I’ll report back once I’ve figured it out. In the meantime, comments are welcome!\n","wordCount":"1016","inLanguage":"en","image":"https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover.webp","datePublished":"2024-04-22T06:00:00Z","dateModified":"2024-04-22T17:38:21+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Assessing a startup's data-to-AI health</h1><div class=post-meta><span title='2024-04-22 06:00:00 +0000 UTC'>April 22, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover_hub9e57ddfdfd17c3f98b514b724234f5e_62014_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover_hub9e57ddfdfd17c3f98b514b724234f5e_62014_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover_hub9e57ddfdfd17c3f98b514b724234f5e_62014_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover_hub9e57ddfdfd17c3f98b514b724234f5e_62014_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover.webp alt="a person dressed as a doctor conducting a health check on a screen" width=1200 height=630></figure><div class=post-content><p>In the past year, I went from exploring product ideas to committing to my current consulting practice. One thing that became apparent was that I needed to get better at communicating my unique value proposition: who I serve, and how I can help them. The circus that is data (/AI/ML/BI/analytics/&mldr;) titles and terminology definitely doesn&rsquo;t help. Sprinkle a couple of decades of hype cycles on top, and you end up where we are today: a mess of inflated expectations followed by disappointments. But also a wealth of opportunities to generate genuine value.</p><p>Anyway, I&rsquo;m now fairly clear on whom I&rsquo;m actively targeting: funded Australian startups (around seed to series A) in the climate & nature tech space, who can use help on their data-to-AI journey. I started calling it data-to-AI rather than data & AI or data/AI/ML because anything AI (/ML/data science/&mldr;) or analytics starts with data – and keeps going back to data.</p><p>How I can help isn&rsquo;t as clearly communicated as I&rsquo;d like it to be, so I&rsquo;ve been working on that. Offerings at the cheap and expensive ends of the spectrum are easy to explain: one-off advisory calls include bespoke on-the-spot advice, while fractional chief data & AI officer engagements include similar responsibilities to those of a full-timer with the same title. However, it&rsquo;s in nobody&rsquo;s best interest to jump straight into a fractional relationship. To address this, I&rsquo;ve been working on a standard offering that&rsquo;d be more structured than advisory calls, deliver value to the client, and allow both parties to uncover opportunities and see how we work together.</p><p>My working title for the offering is <em>Data-to-AI Health Check</em> (better suggestions welcome). The idea is to assess where the startup stands with their data/AI/ML stack and capabilities, and identify the top opportunities for improvement.</p><p>This has been on my mind for a while, so I&rsquo;ve collected a heap of documents and questions for inspiration. I&rsquo;m now at the &ldquo;too overwhelmed&rdquo; phase of turning it into something I can present, but hopefully I&rsquo;ll have it all sorted in the coming weeks.</p><p>In the meantime (and in the spirit of building in public), the rest of this post describes the areas I think are most important to assess. Suggestions for areas I might have missed are welcome. In future posts, I&rsquo;ll add more detail on performing the assessment, which will undoubtedly evolve as I offer it to more clients.</p><h2 id=assessment-areas>Assessment areas<a hidden class=anchor aria-hidden=true href=#assessment-areas>#</a></h2><p><strong>Product and business model.</strong> Understanding what the startup is about and where it&rsquo;s going is key to understanding where data/AI/ML fit in. One useful lens is determining whether the product is <a href=https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/>ML-centric or non-ML</a>, with non-ML products varying in their data intensity from data-centric to data-supported. It&rsquo;s also important to understand key metrics and how they&rsquo;re measured.</p><p><strong>People.</strong> Who&rsquo;s working for the company and what is the team structure? In particular, what are the current data/AI/ML capabilities and experience? Can the current staff deliver what the business needs? If there are skill gaps (e.g., they haven&rsquo;t yet made <a href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/>their first data hire</a>), what&rsquo;s the plan to address them? Can the current team adequately assess the skills of data people?</p><p><strong>Processes and project management.</strong> The best people will fail to deliver projects if the company&rsquo;s processes have deep flaws. My general opinion is that all the best practices from software development can and should be applied to data projects (e.g., see posts from <a href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/>2023</a> and <a href=https://data.blog/2018/03/20/engineering-data-science-at-automattic/ target=_blank rel=noopener>2018</a>). However, data entropy and the probabilistic nature of AI/ML require extra care and practices <em>in addition to</em> traditional software development.</p><p><strong>Culture.</strong> Knowing what people are on the team and what processes are in place isn&rsquo;t enough to assess how well the team can deliver the product vision. Culture – the unwritten norms and beliefs of the company – matters. A lot. For example, if the founder doesn&rsquo;t tolerate data-backed evidence that contradicts their preconceived notions, it&rsquo;s likely to be an impediment to data/AI/ML project delivery. Similarly, it&rsquo;s worth paying attention to how experiments are treated: If a hypothesis behind an experiment turns out to be unsupported, it&rsquo;s not a failure. Failing to learn from experiments is the true failure.</p><p><strong>Data.</strong> What data is the company dealing with? What are the data&rsquo;s volume, velocity, and variety? Is all the necessary data being captured? How clean is it? Where is it stored and how is it processed? What <a href=https://en.wikipedia.org/wiki/Data_management target=_blank rel=noopener>data management</a> practices are in place, both explicitly and implicitly?</p><p><strong>Tech.</strong> Closely related to data is the tech architecture, systems, and software. Tech includes where the data lives and how it flows, particularly how it feeds into AI/ML/analytics applications. Of particular interest is <a href=https://boringtechnology.club/ target=_blank rel=noopener>the allocation of innovation tokens</a>. Innovation tokens should be spent on tech that makes the startup meaningfully unique to its customers. Everything else should be boring and standard, i.e., proven to work and fit for purpose.</p><p><strong>Security and compliance.</strong> Security is interwoven through all of the above. For example, you want a culture where any person can flag security risks – some of which may only be visible if you&rsquo;re close to the code and data. Security breaches and data leaks can destroy companies, especially young startups that haven&rsquo;t earned customer trust yet. Particular attention should be paid to compliance issues that arise with data collection, e.g., around personal and regulated data.</p><p><strong>Other opportunities and risks.</strong> In exploring the above areas, issues that don&rsquo;t fit neatly into any bucket are likely to be uncovered. These may be new opportunities or risks. It&rsquo;s important to keep an eye out for such cases and flag them accordingly.</p><h2 id=closing-thoughts>Closing thoughts<a hidden class=anchor aria-hidden=true href=#closing-thoughts>#</a></h2><p>In my experience, it&rsquo;s easy to find a thousand areas for improvement once you become familiar with a startup or a large company division. It&rsquo;s harder to identify the top three items to work on next – it is a bet on the highest-impact items that are feasible to deliver.</p><p>It is also a challenge to distill the Data-to-AI Health Check to a set of questions that would probe the right areas without burdening the startup too much. I&rsquo;ll report back once I&rsquo;ve figured it out. In the meantime, comments are welcome!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/analytics/>Analytics</a></li><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Assessing a startup's data-to-AI health on x" href="https://x.com/intent/tweet/?text=Assessing%20a%20startup%27s%20data-to-AI%20health&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f22%2fassessing-a-startups-data-to-ai-health%2f&amp;hashtags=analytics%2cartificialintelligence%2cbusiness%2cdatascience%2cdatastrategy%2cmachinelearning%2csoftwareengineering%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Assessing a startup's data-to-AI health on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f22%2fassessing-a-startups-data-to-ai-health%2f&amp;title=Assessing%20a%20startup%27s%20data-to-AI%20health&amp;summary=Assessing%20a%20startup%27s%20data-to-AI%20health&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f22%2fassessing-a-startups-data-to-ai-health%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Assessing a startup's data-to-AI health on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f22%2fassessing-a-startups-data-to-ai-health%2f&title=Assessing%20a%20startup%27s%20data-to-AI%20health"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Assessing a startup's data-to-AI health on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f22%2fassessing-a-startups-data-to-ai-health%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Assessing a startup's data-to-AI health on whatsapp" href="https://api.whatsapp.com/send?text=Assessing%20a%20startup%27s%20data-to-AI%20health%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f22%2fassessing-a-startups-data-to-ai-health%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Assessing a startup's data-to-AI health on telegram" href="https://telegram.me/share/url?text=Assessing%20a%20startup%27s%20data-to-AI%20health&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f22%2fassessing-a-startups-data-to-ai-health%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Assessing a startup's data-to-AI health on ycombinator" href="https://news.ycombinator.com/submitlink?t=Assessing%20a%20startup%27s%20data-to-AI%20health&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f22%2fassessing-a-startups-data-to-ai-health%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="analytics,artificial intelligence,business,data science,data strategy,machine learning,software engineering,startups"><meta name=description content="Reviewing the areas that should be assessed to determine a startup&rsquo;s opportunities and challenges on the data/AI/ML front."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Assessing a startup's data-to-AI health"><meta property="og:description" content="Reviewing the areas that should be assessed to determine a startup&rsquo;s opportunities and challenges on the data/AI/ML front."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/"><meta property="og:image" content="https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-04-22T06:00:00+00:00"><meta property="article:modified_time" content="2024-04-22T17:38:21+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover.webp"><meta name=twitter:title content="Assessing a startup's data-to-AI health"><meta name=twitter:description content="Reviewing the areas that should be assessed to determine a startup&rsquo;s opportunities and challenges on the data/AI/ML front."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Assessing a startup's data-to-AI health","item":"https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Assessing a startup's data-to-AI health","name":"Assessing a startup\u0027s data-to-AI health","description":"Reviewing the areas that should be assessed to determine a startup\u0026rsquo;s opportunities and challenges on the data/AI/ML front.","keywords":["analytics","artificial intelligence","business","data science","data strategy","machine learning","software engineering","startups"],"articleBody":"In the past year, I went from exploring product ideas to committing to my current consulting practice. One thing that became apparent was that I needed to get better at communicating my unique value proposition: who I serve, and how I can help them. The circus that is data (/AI/ML/BI/analytics/…) titles and terminology definitely doesn’t help. Sprinkle a couple of decades of hype cycles on top, and you end up where we are today: a mess of inflated expectations followed by disappointments. But also a wealth of opportunities to generate genuine value.\nAnyway, I’m now fairly clear on whom I’m actively targeting: funded Australian startups (around seed to series A) in the climate \u0026 nature tech space, who can use help on their data-to-AI journey. I started calling it data-to-AI rather than data \u0026 AI or data/AI/ML because anything AI (/ML/data science/…) or analytics starts with data – and keeps going back to data.\nHow I can help isn’t as clearly communicated as I’d like it to be, so I’ve been working on that. Offerings at the cheap and expensive ends of the spectrum are easy to explain: one-off advisory calls include bespoke on-the-spot advice, while fractional chief data \u0026 AI officer engagements include similar responsibilities to those of a full-timer with the same title. However, it’s in nobody’s best interest to jump straight into a fractional relationship. To address this, I’ve been working on a standard offering that’d be more structured than advisory calls, deliver value to the client, and allow both parties to uncover opportunities and see how we work together.\nMy working title for the offering is Data-to-AI Health Check (better suggestions welcome). The idea is to assess where the startup stands with their data/AI/ML stack and capabilities, and identify the top opportunities for improvement.\nThis has been on my mind for a while, so I’ve collected a heap of documents and questions for inspiration. I’m now at the “too overwhelmed” phase of turning it into something I can present, but hopefully I’ll have it all sorted in the coming weeks.\nIn the meantime (and in the spirit of building in public), the rest of this post describes the areas I think are most important to assess. Suggestions for areas I might have missed are welcome. In future posts, I’ll add more detail on performing the assessment, which will undoubtedly evolve as I offer it to more clients.\nAssessment areas Product and business model. Understanding what the startup is about and where it’s going is key to understanding where data/AI/ML fit in. One useful lens is determining whether the product is ML-centric or non-ML, with non-ML products varying in their data intensity from data-centric to data-supported. It’s also important to understand key metrics and how they’re measured.\nPeople. Who’s working for the company and what is the team structure? In particular, what are the current data/AI/ML capabilities and experience? Can the current staff deliver what the business needs? If there are skill gaps (e.g., they haven’t yet made their first data hire), what’s the plan to address them? Can the current team adequately assess the skills of data people?\nProcesses and project management. The best people will fail to deliver projects if the company’s processes have deep flaws. My general opinion is that all the best practices from software development can and should be applied to data projects (e.g., see posts from 2023 and 2018). However, data entropy and the probabilistic nature of AI/ML require extra care and practices in addition to traditional software development.\nCulture. Knowing what people are on the team and what processes are in place isn’t enough to assess how well the team can deliver the product vision. Culture – the unwritten norms and beliefs of the company – matters. A lot. For example, if the founder doesn’t tolerate data-backed evidence that contradicts their preconceived notions, it’s likely to be an impediment to data/AI/ML project delivery. Similarly, it’s worth paying attention to how experiments are treated: If a hypothesis behind an experiment turns out to be unsupported, it’s not a failure. Failing to learn from experiments is the true failure.\nData. What data is the company dealing with? What are the data’s volume, velocity, and variety? Is all the necessary data being captured? How clean is it? Where is it stored and how is it processed? What data management practices are in place, both explicitly and implicitly?\nTech. Closely related to data is the tech architecture, systems, and software. Tech includes where the data lives and how it flows, particularly how it feeds into AI/ML/analytics applications. Of particular interest is the allocation of innovation tokens. Innovation tokens should be spent on tech that makes the startup meaningfully unique to its customers. Everything else should be boring and standard, i.e., proven to work and fit for purpose.\nSecurity and compliance. Security is interwoven through all of the above. For example, you want a culture where any person can flag security risks – some of which may only be visible if you’re close to the code and data. Security breaches and data leaks can destroy companies, especially young startups that haven’t earned customer trust yet. Particular attention should be paid to compliance issues that arise with data collection, e.g., around personal and regulated data.\nOther opportunities and risks. In exploring the above areas, issues that don’t fit neatly into any bucket are likely to be uncovered. These may be new opportunities or risks. It’s important to keep an eye out for such cases and flag them accordingly.\nClosing thoughts In my experience, it’s easy to find a thousand areas for improvement once you become familiar with a startup or a large company division. It’s harder to identify the top three items to work on next – it is a bet on the highest-impact items that are feasible to deliver.\nIt is also a challenge to distill the Data-to-AI Health Check to a set of questions that would probe the right areas without burdening the startup too much. I’ll report back once I’ve figured it out. In the meantime, comments are welcome!\n","wordCount":"1016","inLanguage":"en","image":"https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover.webp","datePublished":"2024-04-22T06:00:00Z","dateModified":"2024-04-22T17:38:21+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Assessing a startup's data-to-AI health</h1><div class=post-meta><span title='2024-04-22 06:00:00 +0000 UTC'>April 22, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover_hub9e57ddfdfd17c3f98b514b724234f5e_62014_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover_hub9e57ddfdfd17c3f98b514b724234f5e_62014_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover_hub9e57ddfdfd17c3f98b514b724234f5e_62014_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover_hub9e57ddfdfd17c3f98b514b724234f5e_62014_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/cover.webp alt="a person dressed as a doctor conducting a health check on a screen" width=1200 height=630></figure><div class=post-content><p>In the past year, I went from exploring product ideas to committing to my current consulting practice. One thing that became apparent was that I needed to get better at communicating my unique value proposition: who I serve, and how I can help them. The circus that is data (/AI/ML/BI/analytics/&mldr;) titles and terminology definitely doesn&rsquo;t help. Sprinkle a couple of decades of hype cycles on top, and you end up where we are today: a mess of inflated expectations followed by disappointments. But also a wealth of opportunities to generate genuine value.</p><p>Anyway, I&rsquo;m now fairly clear on whom I&rsquo;m actively targeting: funded Australian startups (around seed to series A) in the climate & nature tech space, who can use help on their data-to-AI journey. I started calling it data-to-AI rather than data & AI or data/AI/ML because anything AI (/ML/data science/&mldr;) or analytics starts with data – and keeps going back to data.</p><p>How I can help isn&rsquo;t as clearly communicated as I&rsquo;d like it to be, so I&rsquo;ve been working on that. Offerings at the cheap and expensive ends of the spectrum are easy to explain: one-off advisory calls include bespoke on-the-spot advice, while fractional chief data & AI officer engagements include similar responsibilities to those of a full-timer with the same title. However, it&rsquo;s in nobody&rsquo;s best interest to jump straight into a fractional relationship. To address this, I&rsquo;ve been working on a standard offering that&rsquo;d be more structured than advisory calls, deliver value to the client, and allow both parties to uncover opportunities and see how we work together.</p><p>My working title for the offering is <em>Data-to-AI Health Check</em> (better suggestions welcome). The idea is to assess where the startup stands with their data/AI/ML stack and capabilities, and identify the top opportunities for improvement.</p><p>This has been on my mind for a while, so I&rsquo;ve collected a heap of documents and questions for inspiration. I&rsquo;m now at the &ldquo;too overwhelmed&rdquo; phase of turning it into something I can present, but hopefully I&rsquo;ll have it all sorted in the coming weeks.</p><p>In the meantime (and in the spirit of building in public), the rest of this post describes the areas I think are most important to assess. Suggestions for areas I might have missed are welcome. In future posts, I&rsquo;ll add more detail on performing the assessment, which will undoubtedly evolve as I offer it to more clients.</p><h2 id=assessment-areas>Assessment areas<a hidden class=anchor aria-hidden=true href=#assessment-areas>#</a></h2><p><strong>Product and business model.</strong> Understanding what the startup is about and where it&rsquo;s going is key to understanding where data/AI/ML fit in. One useful lens is determining whether the product is <a href=https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/>ML-centric or non-ML</a>, with non-ML products varying in their data intensity from data-centric to data-supported. It&rsquo;s also important to understand key metrics and how they&rsquo;re measured.</p><p><strong>People.</strong> Who&rsquo;s working for the company and what is the team structure? In particular, what are the current data/AI/ML capabilities and experience? Can the current staff deliver what the business needs? If there are skill gaps (e.g., they haven&rsquo;t yet made <a href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/>their first data hire</a>), what&rsquo;s the plan to address them? Can the current team adequately assess the skills of data people?</p><p><strong>Processes and project management.</strong> The best people will fail to deliver projects if the company&rsquo;s processes have deep flaws. My general opinion is that all the best practices from software development can and should be applied to data projects (e.g., see posts from <a href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/>2023</a> and <a href=https://data.blog/2018/03/20/engineering-data-science-at-automattic/ target=_blank rel=noopener>2018</a>). However, data entropy and the probabilistic nature of AI/ML require extra care and practices <em>in addition to</em> traditional software development.</p><p><strong>Culture.</strong> Knowing what people are on the team and what processes are in place isn&rsquo;t enough to assess how well the team can deliver the product vision. Culture – the unwritten norms and beliefs of the company – matters. A lot. For example, if the founder doesn&rsquo;t tolerate data-backed evidence that contradicts their preconceived notions, it&rsquo;s likely to be an impediment to data/AI/ML project delivery. Similarly, it&rsquo;s worth paying attention to how experiments are treated: If a hypothesis behind an experiment turns out to be unsupported, it&rsquo;s not a failure. Failing to learn from experiments is the true failure.</p><p><strong>Data.</strong> What data is the company dealing with? What are the data&rsquo;s volume, velocity, and variety? Is all the necessary data being captured? How clean is it? Where is it stored and how is it processed? What <a href=https://en.wikipedia.org/wiki/Data_management target=_blank rel=noopener>data management</a> practices are in place, both explicitly and implicitly?</p><p><strong>Tech.</strong> Closely related to data is the tech architecture, systems, and software. Tech includes where the data lives and how it flows, particularly how it feeds into AI/ML/analytics applications. Of particular interest is <a href=https://boringtechnology.club/ target=_blank rel=noopener>the allocation of innovation tokens</a>. Innovation tokens should be spent on tech that makes the startup meaningfully unique to its customers. Everything else should be boring and standard, i.e., proven to work and fit for purpose.</p><p><strong>Security and compliance.</strong> Security is interwoven through all of the above. For example, you want a culture where any person can flag security risks – some of which may only be visible if you&rsquo;re close to the code and data. Security breaches and data leaks can destroy companies, especially young startups that haven&rsquo;t earned customer trust yet. Particular attention should be paid to compliance issues that arise with data collection, e.g., around personal and regulated data.</p><p><strong>Other opportunities and risks.</strong> In exploring the above areas, issues that don&rsquo;t fit neatly into any bucket are likely to be uncovered. These may be new opportunities or risks. It&rsquo;s important to keep an eye out for such cases and flag them accordingly.</p><h2 id=closing-thoughts>Closing thoughts<a hidden class=anchor aria-hidden=true href=#closing-thoughts>#</a></h2><p>In my experience, it&rsquo;s easy to find a thousand areas for improvement once you become familiar with a startup or a large company division. It&rsquo;s harder to identify the top three items to work on next – it is a bet on the highest-impact items that are feasible to deliver.</p><p>It is also a challenge to distill the Data-to-AI Health Check to a set of questions that would probe the right areas without burdening the startup too much. I&rsquo;ll report back once I&rsquo;ve figured it out. In the meantime, comments are welcome!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/analytics/>Analytics</a></li><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Assessing a startup's data-to-AI health on x" href="https://x.com/intent/tweet/?text=Assessing%20a%20startup%27s%20data-to-AI%20health&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f22%2fassessing-a-startups-data-to-ai-health%2f&amp;hashtags=analytics%2cartificialintelligence%2cbusiness%2cdatascience%2cdatastrategy%2cmachinelearning%2csoftwareengineering%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Assessing a startup's data-to-AI health on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f22%2fassessing-a-startups-data-to-ai-health%2f&amp;title=Assessing%20a%20startup%27s%20data-to-AI%20health&amp;summary=Assessing%20a%20startup%27s%20data-to-AI%20health&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f22%2fassessing-a-startups-data-to-ai-health%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Assessing a startup's data-to-AI health on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f22%2fassessing-a-startups-data-to-ai-health%2f&title=Assessing%20a%20startup%27s%20data-to-AI%20health"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Assessing a startup's data-to-AI health on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f22%2fassessing-a-startups-data-to-ai-health%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Assessing a startup's data-to-AI health on whatsapp" href="https://api.whatsapp.com/send?text=Assessing%20a%20startup%27s%20data-to-AI%20health%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f22%2fassessing-a-startups-data-to-ai-health%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Assessing a startup's data-to-AI health on telegram" href="https://telegram.me/share/url?text=Assessing%20a%20startup%27s%20data-to-AI%20health&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f22%2fassessing-a-startups-data-to-ai-health%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Assessing a startup's data-to-AI health on ycombinator" href="https://news.ycombinator.com/submitlink?t=Assessing%20a%20startup%27s%20data-to-AI%20health&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f22%2fassessing-a-startups-data-to-ai-health%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2024/04/29/mentorship-and-the-art-of-actionable-advice/index.html b/2024/04/29/mentorship-and-the-art-of-actionable-advice/index.html
index a94afe1f1..47a6c9af8 100644
--- a/2024/04/29/mentorship-and-the-art-of-actionable-advice/index.html
+++ b/2024/04/29/mentorship-and-the-art-of-actionable-advice/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Mentorship and the art of actionable advice | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,career,consulting,personal,startups"><meta name=description content="Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Mentorship and the art of actionable advice"><meta property="og:description" content="Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/"><meta property="og:image" content="https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-04-29T06:30:00+00:00"><meta property="article:modified_time" content="2024-04-29T17:25:28+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship.webp"><meta name=twitter:title content="Mentorship and the art of actionable advice"><meta name=twitter:description content="Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Mentorship and the art of actionable advice","item":"https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Mentorship and the art of actionable advice","name":"Mentorship and the art of actionable advice","description":"Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships.","keywords":["business","career","consulting","personal","startups"],"articleBody":"One of my challenges with the transition to solo consulting is learning to deliver timely, actionable advice. It’s usually easy for me to identify many areas for improvement. Distilling a long list of “obvious” opportunities to the top items that would make a difference is harder. And the hardest thing is packaging it all up as timely advice that people can act on.\nTo help address this challenge, I recently joined EnergyLab and GrowthMentor as a mentor. EnergyLab is Australia’s largest climate tech startup accelerator, while GrowthMentor is an international platform for mentorship around startup growth. Both are relevant to my focus on helping leaders of climate/nature tech startups ship data-intensive solutions (including AI/ML, data science, and advanced analytics – this stuff goes by many confusing and hyped-up names).\nThe rest of this post presents some of my reflections on packaging advice and expertise. I’m always happy to discuss these topics and connect directly with people I may be able to help, so please feel free to reach out with feedback.\nActionable and timely advice by example We all know we should get enough sleep. But telling a busy insomniac with young children to “sleep more” isn’t actionable. It’s more helpful to provide them with specific strategies for improving their sleep hygiene, like keeping screens out of their bed. And the more specific, the better: “At 10pm, put your phone to charge in another room” gives them exactly one thing they can do tonight.\nThere is more to it, though. If your insomniac friend comes to you complaining about having a bad night, they’re probably not expecting advice on where to charge their phone – at least not in that specific moment. The timing of the advice can make all the difference between them following it and them doing nothing (or worse – getting annoyed by your lack of empathy).\nThe same goes for advising anyone about anything.\nIn my case, advice of a general nature like “your data should be clean, relevant, and plentiful” is nice – but it’s also kinda useless. Getting more specific on strategies and tools is better, e.g, “consider dbt to manage and test data transformations”. Getting to the root of what they want to achieve may yield completely different advice, like “don’t worry about dbt for now, if you want that ML project you mentioned to succeed, you need to instrument and start collecting data around feature X of your product as soon as possible.”\nOn listening and packaging expertise To get to a point of giving timely, actionable advice, you need more than functional expertise. It’s important to listen to what the other person is saying (and not saying), and figuring out what they’re most likely to respond to. This is easier with people with whom you’ve already built a relationship than with new acquaintances – which makes the challenge of mentoring at scale all the more interesting.\nOne key aspect is aligning on expectations. Coming at it from the mentor side, I aim to be transparent about where I can and cannot help, so as to only attract mentees who are likely to be a good fit. However, after almost twenty years in the tech industry and over a decade in data / AI / engineering roles with startups \u0026 scaleups, it’s hard to succinctly describe my area of expertise. For example, I liked the label data scientist when it became popular around 2012, but both the label and I have changed over the years. There are major differences between my experience and that of a new data scientist who is fresh out of university. Me using a commodity label like data scientist is not in anyone’s best interest.\nAligning on expectations is easier in close long-term relationships. In our professional lives, such relationships are commonly formed when working for one employer at a time. Indeed, most of my work experience was that of an employee. And like many employees with long-term roles, it was easy for me to identify opportunities for improvement and provide actionable advice to my colleagues. There is a lot of implicit listening going on when you are dedicated to a single employer!\nIn the absence of a long-term relationship, it’s important to communicate expectations ahead of time. For example, this is what I put in as my “support offered” for EnergyLab founders:\nAdvice on data strategy, data hiring, AI/ML projects, data science, advanced analytics, and data-intensive solutions.\nI have over a decade of experience in data / AI / engineering roles with Australian startups (most famous: Car Next Door / Uber Carshare \u0026 Orkestra), international scaleups (Automattic / WordPress.com), and big tech (Intel / Qualcomm / Google). This means I also have many opinions on tech and startups beyond my specific expertise, which may be of use to some founders. :)\nIn a case of a potential fit, the next step on my end is to listen. My aim is to only offer mentorship in situations where I add value. Redirecting founders to others in my network who may be a better fit than me is a better outcome than attempting to give advice on topics that fall outside my area of expertise.\nTrue experts are always learning Another key aspect of providing advice as a mentor/expert is recognising that no one knows everything. Even within narrow areas of Data \u0026 AI, things are moving so fast that even the most knowledgeable people have no chance of keeping up.\nHowever, expertise is a relative term. I know more about shipping data-intensive solutions than a non-technical CEO, so I can probably help them (especially if they don’t have in-house data experts). I know less about PyTorch internals than an ML engineer who has been focused solely on deep learning for the past decade, so I’ll defer to such experts when deep PyTorch expertise is needed.\nAs another analogy, consider a general practice doctor named Amy – she is a medical expert in comparison to most of the population. But Amy wouldn’t try to perform brain surgery – she’ll refer you to a neurosurgeon (Barbara?), who is an expert in comparison to Amy.\nThings are fuzzier in the unregulated software and data worlds. Memorably, the young child of a past manager one day announced: “My computer has data on it! I am a data scientist!” The equivalent of such pronouncements in the adult world was the swift shift of LinkedIn titles in the years after 2012 – peak data science hype. By contrast, declaring yourself a medical doctor will land you in prison in many countries.\nIn the absence of regulated data expertise (which is probably undesirable), we are left with heuristics for determining who should be providing data advice. One of my favourite heuristics aligns with GrowthMentor’s core value of humility. In their words: “Nobody knows everything and we should all be open to hearing a different perspective on what we are working on. […] Opening yourself up to feedback from your peers will not only make you a stronger person, but also lead to more confidence in your professional life.”\nTo me, this is the sign of a true expert: Knowing that you still have a lot to learn. And this brings me back to what I’m aiming to learn and improve through mentorship: Giving timely, actionable advice outside the context of employee-employer relationships.\nI’ll report back on how it goes in the future.\n","wordCount":"1237","inLanguage":"en","image":"https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship.webp","datePublished":"2024-04-29T06:30:00Z","dateModified":"2024-04-29T17:25:28+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Mentorship and the art of actionable advice</h1><div class=post-meta><span title='2024-04-29 06:30:00 +0000 UTC'>April 29, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship_hu46c46c47614881b3f08d915679890e04_69694_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship_hu46c46c47614881b3f08d915679890e04_69694_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship_hu46c46c47614881b3f08d915679890e04_69694_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship_hu46c46c47614881b3f08d915679890e04_69694_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship.webp alt="ChatGPT's depiction of a robot mentoring a robot" width=1200 height=630></figure><div class=post-content><p>One of my challenges with the transition to solo consulting is learning to deliver timely, actionable advice. It&rsquo;s usually easy for me to identify many areas for improvement. Distilling a long list of &ldquo;obvious&rdquo; opportunities to the top items that would make a difference is harder. And the hardest thing is packaging it all up as timely advice that people can act on.</p><p>To help address this challenge, I recently joined <a href=https://energylab.org.au/ target=_blank rel=noopener>EnergyLab</a> and <a href="https://app.growthmentor.com/?ref=396dd44db3" target=_blank rel=noopener>GrowthMentor</a> as a mentor. EnergyLab is Australia&rsquo;s largest climate tech startup accelerator, while GrowthMentor is an international platform for mentorship around startup growth. Both are relevant to my focus on helping leaders of climate/nature tech startups ship data-intensive solutions (including AI/ML, data science, and advanced analytics – this stuff goes by many confusing and hyped-up names).</p><p>The rest of this post presents some of my reflections on packaging advice and expertise. I&rsquo;m always happy to discuss these topics and connect directly with people I may be able to help, so please feel free to reach out with feedback.</p><h2 id=actionable-and-timely-advice-by-example>Actionable and timely advice by example<a hidden class=anchor aria-hidden=true href=#actionable-and-timely-advice-by-example>#</a></h2><p>We all know we should get enough sleep. But telling a busy insomniac with young children to &ldquo;sleep more&rdquo; isn&rsquo;t actionable. It&rsquo;s more helpful to provide them with specific strategies for improving their sleep hygiene, like keeping screens out of their bed. And the more specific, the better: <em>&ldquo;At 10pm, put your phone to charge in another room&rdquo;</em> gives them exactly one thing they can do <em>tonight</em>.</p><p>There is more to it, though. If your insomniac friend comes to you complaining about having a bad night, they&rsquo;re probably not expecting advice on where to charge their phone – at least not in that specific moment. The timing of the advice can make all the difference between them following it and them doing nothing (or worse – getting annoyed by your lack of empathy).</p><p>The same goes for advising anyone about anything.</p><p>In my case, advice of a general nature like <em>&ldquo;your data should be clean, relevant, and plentiful&rdquo;</em> is nice – but it&rsquo;s also kinda useless. Getting more specific on strategies and tools is better, e.g, <em>&ldquo;consider dbt to manage and test data transformations&rdquo;</em>. Getting to the root of what they want to achieve may yield completely different advice, like <em>&ldquo;don&rsquo;t worry about dbt for now, if you want that ML project you mentioned to succeed, you need to instrument and start collecting data around feature X of your product as soon as possible.&rdquo;</em></p><h2 id=on-listening-and-packaging-expertise>On listening and packaging expertise<a hidden class=anchor aria-hidden=true href=#on-listening-and-packaging-expertise>#</a></h2><p>To get to a point of giving timely, actionable advice, you need more than functional expertise. It&rsquo;s important to <em>listen</em> to what the other person is saying (and not saying), and figuring out what they&rsquo;re most likely to respond to. This is easier with people with whom you&rsquo;ve already built a relationship than with new acquaintances – which makes the challenge of mentoring at scale all the more interesting.</p><p>One key aspect is aligning on expectations. Coming at it from the mentor side, I aim to be transparent about where I can and cannot help, so as to only attract mentees who are likely to be a good fit. However, after almost twenty years in the tech industry and over a decade in data / AI / engineering roles with startups & scaleups, it&rsquo;s hard to succinctly describe my area of expertise. For example, <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>I liked the label <em>data scientist</em> when it became popular around 2012</a>, but both the label and I have changed over the years. There are major differences between my experience and that of a new data scientist who is fresh out of university. Me using <a href=https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/>a commodity label like <em>data scientist</em></a> is not in anyone&rsquo;s best interest.</p><p>Aligning on expectations is easier in close long-term relationships. In our professional lives, such relationships are commonly formed when working for one employer at a time. Indeed, most of <a href=https://www.linkedin.com/in/yanirseroussi/ target=_blank rel=noopener>my work experience was that of an employee</a>. And like many employees with long-term roles, it was easy for me to identify opportunities for improvement and provide actionable advice to my colleagues. There is a lot of implicit listening going on when you are dedicated to a single employer!</p><p>In the absence of a long-term relationship, it&rsquo;s important to communicate expectations ahead of time. For example, this is what I put in as my &ldquo;support offered&rdquo; for EnergyLab founders:</p><blockquote><p>Advice on data strategy, data hiring, AI/ML projects, data science, advanced analytics, and data-intensive solutions.</p><p>I have over a decade of experience in data / AI / engineering roles with Australian startups (most famous: Car Next Door / Uber Carshare & Orkestra), international scaleups (Automattic / WordPress.com), and big tech (Intel / Qualcomm / Google). This means I also have many opinions on tech and startups beyond my specific expertise, which may be of use to some founders. :)</p></blockquote><p>In a case of a potential fit, the next step on my end is to listen. My aim is to only offer mentorship in situations where I add value. Redirecting founders to others in my network who may be a better fit than me is a better outcome than attempting to give advice on topics that fall outside my area of expertise.</p><h2 id=true-experts-are-always-learning>True experts are always learning<a hidden class=anchor aria-hidden=true href=#true-experts-are-always-learning>#</a></h2><p>Another key aspect of providing advice as a mentor/expert is recognising that no one knows everything. Even within narrow areas of Data & AI, things are moving so fast that even the most knowledgeable people have no chance of keeping up.</p><p>However, expertise is a relative term. I know more about shipping data-intensive solutions than a non-technical CEO, so I can probably help them (especially if they don&rsquo;t have in-house data experts). I know less about PyTorch internals than an ML engineer who has been focused solely on deep learning for the past decade, so I&rsquo;ll defer to such experts when deep PyTorch expertise is needed.</p><p>As another analogy, consider a general practice doctor named Amy – she is a medical expert in comparison to most of the population. But Amy wouldn&rsquo;t try to perform brain surgery – she&rsquo;ll refer you to a neurosurgeon (Barbara?), who is an expert in comparison to Amy.</p><p>Things are fuzzier in the unregulated software and data worlds. Memorably, the young child of a past manager one day announced: <em>&ldquo;My computer has data on it! I am a data scientist!&rdquo;</em> The equivalent of such pronouncements in the adult world was the swift shift of LinkedIn titles in the years after 2012 – peak data science hype. By contrast, declaring yourself a medical doctor will land you in prison in many countries.</p><p>In the absence of regulated data expertise (which is probably undesirable), we are left with heuristics for determining who should be providing data advice. One of my favourite heuristics aligns with GrowthMentor&rsquo;s core value of <em>humility</em>. In their words: <em>&ldquo;Nobody knows everything and we should all be open to hearing a different perspective on what we are working on. [&mldr;] Opening yourself up to feedback from your peers will not only make you a stronger person, but also lead to more confidence in your professional life.&rdquo;</em></p><p>To me, this is the sign of a true expert: Knowing that you still have a lot to learn. And this brings me back to what I&rsquo;m aiming to learn and improve through mentorship: Giving timely, actionable advice outside the context of employee-employer relationships.</p><p>I&rsquo;ll report back on how it goes in the future.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/consulting/>Consulting</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Mentorship and the art of actionable advice on x" href="https://x.com/intent/tweet/?text=Mentorship%20and%20the%20art%20of%20actionable%20advice&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f29%2fmentorship-and-the-art-of-actionable-advice%2f&amp;hashtags=business%2ccareer%2cconsulting%2cpersonal%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Mentorship and the art of actionable advice on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f29%2fmentorship-and-the-art-of-actionable-advice%2f&amp;title=Mentorship%20and%20the%20art%20of%20actionable%20advice&amp;summary=Mentorship%20and%20the%20art%20of%20actionable%20advice&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f29%2fmentorship-and-the-art-of-actionable-advice%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Mentorship and the art of actionable advice on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f29%2fmentorship-and-the-art-of-actionable-advice%2f&title=Mentorship%20and%20the%20art%20of%20actionable%20advice"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Mentorship and the art of actionable advice on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f29%2fmentorship-and-the-art-of-actionable-advice%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Mentorship and the art of actionable advice on whatsapp" href="https://api.whatsapp.com/send?text=Mentorship%20and%20the%20art%20of%20actionable%20advice%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f29%2fmentorship-and-the-art-of-actionable-advice%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Mentorship and the art of actionable advice on telegram" href="https://telegram.me/share/url?text=Mentorship%20and%20the%20art%20of%20actionable%20advice&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f29%2fmentorship-and-the-art-of-actionable-advice%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Mentorship and the art of actionable advice on ycombinator" href="https://news.ycombinator.com/submitlink?t=Mentorship%20and%20the%20art%20of%20actionable%20advice&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f29%2fmentorship-and-the-art-of-actionable-advice%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="business,career,consulting,personal,startups"><meta name=description content="Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Mentorship and the art of actionable advice"><meta property="og:description" content="Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/"><meta property="og:image" content="https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-04-29T06:30:00+00:00"><meta property="article:modified_time" content="2024-04-29T17:25:28+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship.webp"><meta name=twitter:title content="Mentorship and the art of actionable advice"><meta name=twitter:description content="Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Mentorship and the art of actionable advice","item":"https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Mentorship and the art of actionable advice","name":"Mentorship and the art of actionable advice","description":"Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships.","keywords":["business","career","consulting","personal","startups"],"articleBody":"One of my challenges with the transition to solo consulting is learning to deliver timely, actionable advice. It’s usually easy for me to identify many areas for improvement. Distilling a long list of “obvious” opportunities to the top items that would make a difference is harder. And the hardest thing is packaging it all up as timely advice that people can act on.\nTo help address this challenge, I recently joined EnergyLab and GrowthMentor as a mentor. EnergyLab is Australia’s largest climate tech startup accelerator, while GrowthMentor is an international platform for mentorship around startup growth. Both are relevant to my focus on helping leaders of climate/nature tech startups ship data-intensive solutions (including AI/ML, data science, and advanced analytics – this stuff goes by many confusing and hyped-up names).\nThe rest of this post presents some of my reflections on packaging advice and expertise. I’m always happy to discuss these topics and connect directly with people I may be able to help, so please feel free to reach out with feedback.\nActionable and timely advice by example We all know we should get enough sleep. But telling a busy insomniac with young children to “sleep more” isn’t actionable. It’s more helpful to provide them with specific strategies for improving their sleep hygiene, like keeping screens out of their bed. And the more specific, the better: “At 10pm, put your phone to charge in another room” gives them exactly one thing they can do tonight.\nThere is more to it, though. If your insomniac friend comes to you complaining about having a bad night, they’re probably not expecting advice on where to charge their phone – at least not in that specific moment. The timing of the advice can make all the difference between them following it and them doing nothing (or worse – getting annoyed by your lack of empathy).\nThe same goes for advising anyone about anything.\nIn my case, advice of a general nature like “your data should be clean, relevant, and plentiful” is nice – but it’s also kinda useless. Getting more specific on strategies and tools is better, e.g, “consider dbt to manage and test data transformations”. Getting to the root of what they want to achieve may yield completely different advice, like “don’t worry about dbt for now, if you want that ML project you mentioned to succeed, you need to instrument and start collecting data around feature X of your product as soon as possible.”\nOn listening and packaging expertise To get to a point of giving timely, actionable advice, you need more than functional expertise. It’s important to listen to what the other person is saying (and not saying), and figuring out what they’re most likely to respond to. This is easier with people with whom you’ve already built a relationship than with new acquaintances – which makes the challenge of mentoring at scale all the more interesting.\nOne key aspect is aligning on expectations. Coming at it from the mentor side, I aim to be transparent about where I can and cannot help, so as to only attract mentees who are likely to be a good fit. However, after almost twenty years in the tech industry and over a decade in data / AI / engineering roles with startups \u0026 scaleups, it’s hard to succinctly describe my area of expertise. For example, I liked the label data scientist when it became popular around 2012, but both the label and I have changed over the years. There are major differences between my experience and that of a new data scientist who is fresh out of university. Me using a commodity label like data scientist is not in anyone’s best interest.\nAligning on expectations is easier in close long-term relationships. In our professional lives, such relationships are commonly formed when working for one employer at a time. Indeed, most of my work experience was that of an employee. And like many employees with long-term roles, it was easy for me to identify opportunities for improvement and provide actionable advice to my colleagues. There is a lot of implicit listening going on when you are dedicated to a single employer!\nIn the absence of a long-term relationship, it’s important to communicate expectations ahead of time. For example, this is what I put in as my “support offered” for EnergyLab founders:\nAdvice on data strategy, data hiring, AI/ML projects, data science, advanced analytics, and data-intensive solutions.\nI have over a decade of experience in data / AI / engineering roles with Australian startups (most famous: Car Next Door / Uber Carshare \u0026 Orkestra), international scaleups (Automattic / WordPress.com), and big tech (Intel / Qualcomm / Google). This means I also have many opinions on tech and startups beyond my specific expertise, which may be of use to some founders. :)\nIn a case of a potential fit, the next step on my end is to listen. My aim is to only offer mentorship in situations where I add value. Redirecting founders to others in my network who may be a better fit than me is a better outcome than attempting to give advice on topics that fall outside my area of expertise.\nTrue experts are always learning Another key aspect of providing advice as a mentor/expert is recognising that no one knows everything. Even within narrow areas of Data \u0026 AI, things are moving so fast that even the most knowledgeable people have no chance of keeping up.\nHowever, expertise is a relative term. I know more about shipping data-intensive solutions than a non-technical CEO, so I can probably help them (especially if they don’t have in-house data experts). I know less about PyTorch internals than an ML engineer who has been focused solely on deep learning for the past decade, so I’ll defer to such experts when deep PyTorch expertise is needed.\nAs another analogy, consider a general practice doctor named Amy – she is a medical expert in comparison to most of the population. But Amy wouldn’t try to perform brain surgery – she’ll refer you to a neurosurgeon (Barbara?), who is an expert in comparison to Amy.\nThings are fuzzier in the unregulated software and data worlds. Memorably, the young child of a past manager one day announced: “My computer has data on it! I am a data scientist!” The equivalent of such pronouncements in the adult world was the swift shift of LinkedIn titles in the years after 2012 – peak data science hype. By contrast, declaring yourself a medical doctor will land you in prison in many countries.\nIn the absence of regulated data expertise (which is probably undesirable), we are left with heuristics for determining who should be providing data advice. One of my favourite heuristics aligns with GrowthMentor’s core value of humility. In their words: “Nobody knows everything and we should all be open to hearing a different perspective on what we are working on. […] Opening yourself up to feedback from your peers will not only make you a stronger person, but also lead to more confidence in your professional life.”\nTo me, this is the sign of a true expert: Knowing that you still have a lot to learn. And this brings me back to what I’m aiming to learn and improve through mentorship: Giving timely, actionable advice outside the context of employee-employer relationships.\nI’ll report back on how it goes in the future.\n","wordCount":"1237","inLanguage":"en","image":"https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship.webp","datePublished":"2024-04-29T06:30:00Z","dateModified":"2024-04-29T17:25:28+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Mentorship and the art of actionable advice</h1><div class=post-meta><span title='2024-04-29 06:30:00 +0000 UTC'>April 29, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship_hu46c46c47614881b3f08d915679890e04_69694_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship_hu46c46c47614881b3f08d915679890e04_69694_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship_hu46c46c47614881b3f08d915679890e04_69694_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship_hu46c46c47614881b3f08d915679890e04_69694_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/robot-mentorship.webp alt="ChatGPT's depiction of a robot mentoring a robot" width=1200 height=630></figure><div class=post-content><p>One of my challenges with the transition to solo consulting is learning to deliver timely, actionable advice. It&rsquo;s usually easy for me to identify many areas for improvement. Distilling a long list of &ldquo;obvious&rdquo; opportunities to the top items that would make a difference is harder. And the hardest thing is packaging it all up as timely advice that people can act on.</p><p>To help address this challenge, I recently joined <a href=https://energylab.org.au/ target=_blank rel=noopener>EnergyLab</a> and <a href="https://app.growthmentor.com/?ref=396dd44db3" target=_blank rel=noopener>GrowthMentor</a> as a mentor. EnergyLab is Australia&rsquo;s largest climate tech startup accelerator, while GrowthMentor is an international platform for mentorship around startup growth. Both are relevant to my focus on helping leaders of climate/nature tech startups ship data-intensive solutions (including AI/ML, data science, and advanced analytics – this stuff goes by many confusing and hyped-up names).</p><p>The rest of this post presents some of my reflections on packaging advice and expertise. I&rsquo;m always happy to discuss these topics and connect directly with people I may be able to help, so please feel free to reach out with feedback.</p><h2 id=actionable-and-timely-advice-by-example>Actionable and timely advice by example<a hidden class=anchor aria-hidden=true href=#actionable-and-timely-advice-by-example>#</a></h2><p>We all know we should get enough sleep. But telling a busy insomniac with young children to &ldquo;sleep more&rdquo; isn&rsquo;t actionable. It&rsquo;s more helpful to provide them with specific strategies for improving their sleep hygiene, like keeping screens out of their bed. And the more specific, the better: <em>&ldquo;At 10pm, put your phone to charge in another room&rdquo;</em> gives them exactly one thing they can do <em>tonight</em>.</p><p>There is more to it, though. If your insomniac friend comes to you complaining about having a bad night, they&rsquo;re probably not expecting advice on where to charge their phone – at least not in that specific moment. The timing of the advice can make all the difference between them following it and them doing nothing (or worse – getting annoyed by your lack of empathy).</p><p>The same goes for advising anyone about anything.</p><p>In my case, advice of a general nature like <em>&ldquo;your data should be clean, relevant, and plentiful&rdquo;</em> is nice – but it&rsquo;s also kinda useless. Getting more specific on strategies and tools is better, e.g, <em>&ldquo;consider dbt to manage and test data transformations&rdquo;</em>. Getting to the root of what they want to achieve may yield completely different advice, like <em>&ldquo;don&rsquo;t worry about dbt for now, if you want that ML project you mentioned to succeed, you need to instrument and start collecting data around feature X of your product as soon as possible.&rdquo;</em></p><h2 id=on-listening-and-packaging-expertise>On listening and packaging expertise<a hidden class=anchor aria-hidden=true href=#on-listening-and-packaging-expertise>#</a></h2><p>To get to a point of giving timely, actionable advice, you need more than functional expertise. It&rsquo;s important to <em>listen</em> to what the other person is saying (and not saying), and figuring out what they&rsquo;re most likely to respond to. This is easier with people with whom you&rsquo;ve already built a relationship than with new acquaintances – which makes the challenge of mentoring at scale all the more interesting.</p><p>One key aspect is aligning on expectations. Coming at it from the mentor side, I aim to be transparent about where I can and cannot help, so as to only attract mentees who are likely to be a good fit. However, after almost twenty years in the tech industry and over a decade in data / AI / engineering roles with startups & scaleups, it&rsquo;s hard to succinctly describe my area of expertise. For example, <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>I liked the label <em>data scientist</em> when it became popular around 2012</a>, but both the label and I have changed over the years. There are major differences between my experience and that of a new data scientist who is fresh out of university. Me using <a href=https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/>a commodity label like <em>data scientist</em></a> is not in anyone&rsquo;s best interest.</p><p>Aligning on expectations is easier in close long-term relationships. In our professional lives, such relationships are commonly formed when working for one employer at a time. Indeed, most of <a href=https://www.linkedin.com/in/yanirseroussi/ target=_blank rel=noopener>my work experience was that of an employee</a>. And like many employees with long-term roles, it was easy for me to identify opportunities for improvement and provide actionable advice to my colleagues. There is a lot of implicit listening going on when you are dedicated to a single employer!</p><p>In the absence of a long-term relationship, it&rsquo;s important to communicate expectations ahead of time. For example, this is what I put in as my &ldquo;support offered&rdquo; for EnergyLab founders:</p><blockquote><p>Advice on data strategy, data hiring, AI/ML projects, data science, advanced analytics, and data-intensive solutions.</p><p>I have over a decade of experience in data / AI / engineering roles with Australian startups (most famous: Car Next Door / Uber Carshare & Orkestra), international scaleups (Automattic / WordPress.com), and big tech (Intel / Qualcomm / Google). This means I also have many opinions on tech and startups beyond my specific expertise, which may be of use to some founders. :)</p></blockquote><p>In a case of a potential fit, the next step on my end is to listen. My aim is to only offer mentorship in situations where I add value. Redirecting founders to others in my network who may be a better fit than me is a better outcome than attempting to give advice on topics that fall outside my area of expertise.</p><h2 id=true-experts-are-always-learning>True experts are always learning<a hidden class=anchor aria-hidden=true href=#true-experts-are-always-learning>#</a></h2><p>Another key aspect of providing advice as a mentor/expert is recognising that no one knows everything. Even within narrow areas of Data & AI, things are moving so fast that even the most knowledgeable people have no chance of keeping up.</p><p>However, expertise is a relative term. I know more about shipping data-intensive solutions than a non-technical CEO, so I can probably help them (especially if they don&rsquo;t have in-house data experts). I know less about PyTorch internals than an ML engineer who has been focused solely on deep learning for the past decade, so I&rsquo;ll defer to such experts when deep PyTorch expertise is needed.</p><p>As another analogy, consider a general practice doctor named Amy – she is a medical expert in comparison to most of the population. But Amy wouldn&rsquo;t try to perform brain surgery – she&rsquo;ll refer you to a neurosurgeon (Barbara?), who is an expert in comparison to Amy.</p><p>Things are fuzzier in the unregulated software and data worlds. Memorably, the young child of a past manager one day announced: <em>&ldquo;My computer has data on it! I am a data scientist!&rdquo;</em> The equivalent of such pronouncements in the adult world was the swift shift of LinkedIn titles in the years after 2012 – peak data science hype. By contrast, declaring yourself a medical doctor will land you in prison in many countries.</p><p>In the absence of regulated data expertise (which is probably undesirable), we are left with heuristics for determining who should be providing data advice. One of my favourite heuristics aligns with GrowthMentor&rsquo;s core value of <em>humility</em>. In their words: <em>&ldquo;Nobody knows everything and we should all be open to hearing a different perspective on what we are working on. [&mldr;] Opening yourself up to feedback from your peers will not only make you a stronger person, but also lead to more confidence in your professional life.&rdquo;</em></p><p>To me, this is the sign of a true expert: Knowing that you still have a lot to learn. And this brings me back to what I&rsquo;m aiming to learn and improve through mentorship: Giving timely, actionable advice outside the context of employee-employer relationships.</p><p>I&rsquo;ll report back on how it goes in the future.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/consulting/>Consulting</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Mentorship and the art of actionable advice on x" href="https://x.com/intent/tweet/?text=Mentorship%20and%20the%20art%20of%20actionable%20advice&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f29%2fmentorship-and-the-art-of-actionable-advice%2f&amp;hashtags=business%2ccareer%2cconsulting%2cpersonal%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Mentorship and the art of actionable advice on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f29%2fmentorship-and-the-art-of-actionable-advice%2f&amp;title=Mentorship%20and%20the%20art%20of%20actionable%20advice&amp;summary=Mentorship%20and%20the%20art%20of%20actionable%20advice&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f29%2fmentorship-and-the-art-of-actionable-advice%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Mentorship and the art of actionable advice on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f29%2fmentorship-and-the-art-of-actionable-advice%2f&title=Mentorship%20and%20the%20art%20of%20actionable%20advice"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Mentorship and the art of actionable advice on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f29%2fmentorship-and-the-art-of-actionable-advice%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Mentorship and the art of actionable advice on whatsapp" href="https://api.whatsapp.com/send?text=Mentorship%20and%20the%20art%20of%20actionable%20advice%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f29%2fmentorship-and-the-art-of-actionable-advice%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Mentorship and the art of actionable advice on telegram" href="https://telegram.me/share/url?text=Mentorship%20and%20the%20art%20of%20actionable%20advice&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f29%2fmentorship-and-the-art-of-actionable-advice%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Mentorship and the art of actionable advice on ycombinator" href="https://news.ycombinator.com/submitlink?t=Mentorship%20and%20the%20art%20of%20actionable%20advice&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f04%2f29%2fmentorship-and-the-art-of-actionable-advice%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/index.html b/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/index.html
index 1f328bbd6..0f278e20e 100644
--- a/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/index.html
+++ b/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Business questions to ask before taking a startup data role | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,career,data strategy,startups"><meta name=description content="Fourteen questions that prospective employees should ask about a startup&rsquo;s business model and product, especially for data-focused roles."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Business questions to ask before taking a startup data role"><meta property="og:description" content="Fourteen questions that prospective employees should ask about a startup&rsquo;s business model and product, especially for data-focused roles."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/"><meta property="og:image" content="https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-05-06T04:30:00+00:00"><meta property="article:modified_time" content="2024-05-06T14:41:43+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark.webp"><meta name=twitter:title content="Business questions to ask before taking a startup data role"><meta name=twitter:description content="Fourteen questions that prospective employees should ask about a startup&rsquo;s business model and product, especially for data-focused roles."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Business questions to ask before taking a startup data role","item":"https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Business questions to ask before taking a startup data role","name":"Business questions to ask before taking a startup data role","description":"Fourteen questions that prospective employees should ask about a startup\u0026rsquo;s business model and product, especially for data-focused roles.","keywords":["business","career","data strategy","startups"],"articleBody":"If you join a startup as an early employee, you’re essentially an investor. But unlike capital investors, you can’t diversify your portfolio as a full-timer. You need assurances that your time investment is likely to yield a positive return. Ideally, this would be a better return than the return on any other use of your work time.\nGood startups guarantee a return for early employees by paying a competitive base salary. Typically, compensation also includes equity in the company, which vests over time. However, unlike equity in publicly-traded companies, there’s a good chance that your startup equity will be worthless.\nTo help you assess the value of your startup equity, this post presents questions from the Product \u0026 Business Model section of my Data-to-AI Health Check for Startups. In creating the health check, I realised I could have titled it “questions I should have asked past employers”, i.e., you get to learn from my mistakes! The health check has seven other areas, which I may cover in future posts.\nBefore proceeding, note these assumptions:\nYou don’t need to take a specific job offer urgently, i.e., you have multiple options or time to search for better options. If there’s urgency, you can take a suboptimal role, build your skills and savings, and aim to generate better options on your next search. You’re not seen as too junior to be asking deep questions. If you’re early in your career, consider if early-stage startups are for you: My view is that you may be better off working with an established company, where there are more structured mentorship opportunities. The startup isn’t proposing an unpaid equity-only position. If you’re expected to work full-time without a salary, you’re a founder. The startup is small enough for you to speak with the founders as part of the recruitment process (maybe \u003c100 employees). The founders aren’t refusing to answer your questions. If they are, move on. Investor-level product \u0026 business questions Most of the following questions are typically answered by a pitch deck, so asking the founders to take you through the pitch may be the most time-efficient way of getting answers. Depending on the stage of product development, you may also be able to gather some answers yourself from the company’s website.\nWhile these are investor-level questions, your assessment of the answers should be different from that of an investor. You don’t have dozens of other startups in your portfolio – only one. Better make sure it’s a good one.\nQ1: What is your company’s purpose? What problem are you solving and why? You’re looking for a plausible story and solution. It’s important to understand founder motivations, and assess whether you want to spend many of your waking hours bringing their vision to life.\nQ2: What does your product do? If the product is already live, a demo would be best to answer this. Otherwise, wireframes and other plans would be good enough. If the product isn’t live yet, watch out for unrealistic plans. You don’t want to work on a product that will never get released.\nQ3: What are the relevant market sizes (TAM/SAM/SOM)? TAM is total addressable market – the total market demand for the product, which helps assess the growth potential. SAM is serviceable addressable market – the market demand the product can plausibly fulfill, which helps assess revenue targets. SOM is serviceable obtainable market – the part of the market that the startup’s product can capture, which helps assess short-term growth potential.\nQ4: Where do the problem, market, and solution sit on Jason Cohen’s problem flowchart? While TAM/SAM/SOM are useful high-level metrics, the problem flowchart goes deeper into assessing the viability of a startup. A surprising number of startups skip such assessments, and fail as a result. You may regret joining startups that make such preventable blunders.\nQ5: What is on the product roadmap for the next 6-12-24 months? In startup-land, plans for 12-24 months are in the realm of wishful thinking, but it’s good to have an idea of the general direction. Knowing what’s on the roadmap for the next six months will help you assess whether you want to come on board.\nQ6: What is the business model, i.e., how do you make money? Together with the other questions, this will help you assess the viability of the business. You should be especially wary if the founders haven’t figured out how to generate revenue yet, which means they’ll have to raise money to keep paying you. If they’re not seeing healthy growth in other key metrics (e.g., number of active users), they’ll struggle to raise more funding.\nQ7: What is the competition? How’s your product differentiated in the eyes of customers? How hard is it for competitors to copy you? Founders should have solid knowledge of the competitive landscape, and be able to explain why customers choose their product over the competition – and why they’ll continue to do so. Steer clear of founders who exhibit a low understanding of customer wants and needs. The company’s value ultimately comes from making something people want.\nQ8: What are the key business metrics (definitions, values, and trajectories)? This is especially pertinent if you’re the type of data person who’s going to get deep into business metrics as part of your job (a data scientist/analyst, as opposed to a data/AI/ML engineer). But regardless of role, it’s important for you to know how the business is performing. You should be confident that startup executives are measuring the right things.\nQ9: Since the last raise, how has the company performed against its goals? This includes goals that are covered by the key business metrics, as well as product development milestones. Repeatedly failing to achieve self-imposed goals is often a red flag – the goals may be unrealistic, and the business may not be viable.\nQ10: How much runway is left before another raise is needed? This is critical for employees to know. For example, if there are only three months left before the startup runs out of money, you may be out of a job pretty quickly. Note that the question still applies if the startup is bootstrapped (i.e., self-funded or funded by revenue) – money needs to come from somewhere to cover your salary.\nData-to-AI product \u0026 business questions While the above questions should be asked by any early startup employee, you should also get answers for the following questions if you’re considering a data/AI/ML role. If you’re the first data hire, pay specific attention to answers that indicate that the startup isn’t ready for a data hire, or that you may have to wear hats you’re unwilling to wear. For example, if you’re passionate about advanced AI/ML modelling but there are gaps in data engineering and basic analytics, you’re likely to be the one doing the data work to address those gaps.\nQ11: What is the data intensity of the product on a scale of 1-5? High data intensity typically requires low-latency processing of large volumes of data with more than one database server. With high intensity, a slowdown in data processing would noticeably affect key business metrics. High data intensity means that solid data engineering skills are required for success – it’s important to ascertain that founders are aware of this requirement.\nQ12: Is advanced AI/ML core to the product? What if you used simple heuristics? One issue with AI/ML is the hype. AI is indeed transformative and exciting, but using AI isn’t always required for the product to succeed. In the words of Google’s first rule of ML: “Don’t be afraid to launch a product without machine learning”. As a data professional and an outsider, you are in a good position to assess whether advanced AI/ML has to be core to the product. The answer should only be yes if it would make a difference in the eyes of the customers. Using AI/ML too early is often a premature optimisation. You should assess whether the added complexity of dealing with MLOps is justified.\nQ13: Are you planning to increase data intensity or advanced AI/ML use? Why? This question is similar to the one about the product roadmap, but specific to data/AI/ML. Again, the Why is key – ensure that there’s a solid business case for increased data/AI/ML complexity. In a healthy startup, increased complexity is driven by customer need, not by excitement about shiny tech.\nQ14: Are any decisions routinely blocked or delayed by limited access to data? This question helps assess gaps in data collection and quality, as well as the company’s culture around the use of data. It should also help you understand what sort of work is likely to be needed, e.g., even if there are plans to use more advanced AI/ML, the reality of data gaps may mean that plenty of data engineering work is needed.\nFeedback welcome If you found the above questions helpful or if you have any other feedback, I’d love to hear from you. I’m planning to evolve my Data-to-AI Health Check over time and post more on the other areas you should ask about. Subscribing for updates is the best way to get notified when it happens.\n","wordCount":"1525","inLanguage":"en","image":"https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark.webp","datePublished":"2024-05-06T04:30:00Z","dateModified":"2024-05-06T14:41:43+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Business questions to ask before taking a startup data role</h1><div class=post-meta><span title='2024-05-06 04:30:00 +0000 UTC'>May 6, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark_hu379e0ee4dc47d3cccfc6c225f8b70eb3_8314_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark_hu379e0ee4dc47d3cccfc6c225f8b70eb3_8314_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark_hu379e0ee4dc47d3cccfc6c225f8b70eb3_8314_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark_hu379e0ee4dc47d3cccfc6c225f8b70eb3_8314_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark.webp alt="black box with a question mark" width=1200 height=630></figure><div class=post-content><p>If you join a startup as an early employee, you&rsquo;re essentially an investor. But unlike capital investors, you can&rsquo;t diversify your portfolio as a full-timer. You need assurances that your time investment is likely to yield a positive return. Ideally, this would be a better return than the return on any other use of your work time.</p><p>Good startups guarantee a return for early employees by paying a competitive base salary. Typically, compensation also includes equity in the company, which vests over time. However, unlike equity in publicly-traded companies, there&rsquo;s a good chance that your startup equity will be worthless.</p><p>To help you assess the value of your startup equity, this post presents questions from the Product & Business Model section of my <a href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/>Data-to-AI Health Check for Startups</a>. In creating the health check, I realised I could have titled it <em>&ldquo;questions I should have asked past employers&rdquo;</em>, i.e., you get to learn from my mistakes! The health check has seven other areas, which I may cover in future posts.</p><p>Before proceeding, note these assumptions:</p><ol><li>You don&rsquo;t need to take a specific job offer urgently, i.e., you have multiple options or time to search for better options. If there&rsquo;s urgency, you can take a suboptimal role, build your skills and savings, and aim to generate better options on your next search.</li><li>You&rsquo;re not seen as too junior to be asking deep questions. If you&rsquo;re early in your career, consider if early-stage startups are for you: My view is that you may be better off working with an established company, where there are more structured mentorship opportunities.</li><li>The startup isn&rsquo;t proposing an unpaid equity-only position. If you&rsquo;re expected to work full-time without a salary, you&rsquo;re a founder.</li><li>The startup is small enough for you to speak with the founders as part of the recruitment process (maybe &lt;100 employees).</li><li>The founders aren&rsquo;t refusing to answer your questions. If they are, move on.</li></ol><h2 id=investor-level-product--business-questions>Investor-level product & business questions<a hidden class=anchor aria-hidden=true href=#investor-level-product--business-questions>#</a></h2><p>Most of the following questions are typically answered by <a href=https://pitchdeckcoach.com/sequoia-capital-pitch-deck target=_blank rel=noopener>a pitch deck</a>, so asking the founders to take you through the pitch may be the most time-efficient way of getting answers. Depending on the stage of product development, you may also be able to gather some answers yourself from the company&rsquo;s website.</p><p>While these are investor-level questions, your assessment of the answers should be different from that of an investor. You don&rsquo;t have dozens of other startups in your portfolio – only one. Better make sure it&rsquo;s a good one.</p><p><strong>Q1: What is your company&rsquo;s purpose? What problem are you solving and why?</strong> You&rsquo;re looking for a plausible story and solution. It&rsquo;s important to understand founder motivations, and assess whether you want to spend many of your waking hours bringing their vision to life.</p><p><strong>Q2: What does your product do?</strong> If the product is already live, a demo would be best to answer this. Otherwise, wireframes and other plans would be good enough. If the product isn&rsquo;t live yet, watch out for unrealistic plans. You don&rsquo;t want to work on a product that will never get released.</p><p><strong>Q3: What are the relevant market sizes (TAM/SAM/SOM)?</strong> TAM is <em>total addressable market</em> – the total market demand for the product, which helps assess the growth potential. SAM is <em>serviceable addressable market</em> – the market demand the product can plausibly fulfill, which helps assess revenue targets. SOM is <em>serviceable obtainable market</em> – the part of the market that the startup&rsquo;s product can capture, which helps assess short-term growth potential.</p><p><strong>Q4: Where do the problem, market, and solution sit on <a href=https://longform.asmartbear.com/problem/ target=_blank rel=noopener>Jason Cohen&rsquo;s problem flowchart</a>?</strong> While TAM/SAM/SOM are useful high-level metrics, the problem flowchart goes deeper into assessing the viability of a startup. A surprising number of startups skip such assessments, and fail as a result. You may regret joining startups that make such <a href=https://longform.asmartbear.com/avoid-blundering/ target=_blank rel=noopener>preventable blunders</a>.</p><p><strong>Q5: What is on the product roadmap for the next 6-12-24 months?</strong> In startup-land, plans for 12-24 months are in the realm of wishful thinking, but it&rsquo;s good to have an idea of the general direction. Knowing what&rsquo;s on the roadmap for the next six months will help you assess whether you want to come on board.</p><p><strong>Q6: What is the business model, i.e., how do you make money?</strong> Together with the other questions, this will help you assess the viability of the business. You should be especially wary if the founders haven&rsquo;t figured out how to generate revenue yet, which means they&rsquo;ll have to raise money to keep paying you. If they&rsquo;re not seeing healthy growth in other key metrics (e.g., number of active users), they&rsquo;ll struggle to raise more funding.</p><p><strong>Q7: What is the competition? How&rsquo;s your product differentiated in the eyes of customers? How hard is it for competitors to copy you?</strong> Founders should have solid knowledge of the competitive landscape, and be able to explain why customers choose their product over the competition – and why they&rsquo;ll continue to do so. Steer clear of founders who exhibit a low understanding of customer wants and needs. The company&rsquo;s value ultimately comes from making something people want.</p><p><strong>Q8: What are the key business metrics (definitions, values, and trajectories)?</strong> This is especially pertinent if you&rsquo;re the type of data person who&rsquo;s going to get deep into business metrics as part of your job (a data scientist/analyst, as opposed to a data/AI/ML engineer). But regardless of role, it&rsquo;s important for you to know how the business is performing. You should be confident that startup executives are measuring the right things.</p><p><strong>Q9: Since the last raise, how has the company performed against its goals?</strong> This includes goals that are covered by the key business metrics, as well as product development milestones. Repeatedly failing to achieve self-imposed goals is often a red flag – the goals may be unrealistic, and the business may not be viable.</p><p><strong>Q10: How much runway is left before another raise is needed?</strong> This is critical for employees to know. For example, if there are only three months left before the startup runs out of money, you may be out of a job pretty quickly. Note that the question still applies if the startup is bootstrapped (i.e., self-funded or funded by revenue) – money needs to come from somewhere to cover your salary.</p><h2 id=data-to-ai-product--business-questions>Data-to-AI product & business questions<a hidden class=anchor aria-hidden=true href=#data-to-ai-product--business-questions>#</a></h2><p>While the above questions should be asked by any early startup employee, you should also get answers for the following questions if you&rsquo;re considering a data/AI/ML role. If you&rsquo;re <a href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/>the first data hire</a>, pay specific attention to answers that indicate that the startup isn&rsquo;t ready for a data hire, or that you may have to wear hats you&rsquo;re unwilling to wear. For example, if you&rsquo;re passionate about advanced AI/ML modelling but there are gaps in data engineering and basic analytics, you&rsquo;re likely to be the one doing the data work to address those gaps.</p><p><strong>Q11: What is the data intensity of the product on a scale of 1-5?</strong> High data intensity typically requires low-latency processing of large volumes of data with more than one database server. With high intensity, a slowdown in data processing would noticeably affect key business metrics. High data intensity means that solid data engineering skills are required for success – it&rsquo;s important to ascertain that founders are aware of this requirement.</p><p><strong>Q12: Is advanced AI/ML core to the product? What if you used simple heuristics?</strong> One issue with AI/ML is the hype. <a href=https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/>AI is indeed transformative and exciting</a>, but using AI isn&rsquo;t always required for the product to succeed. In the words of <a href=https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/>Google&rsquo;s first rule of ML</a>: <em>&ldquo;Don&rsquo;t be afraid to launch a product without machine learning&rdquo;</em>. As a data professional and an outsider, you are in a good position to assess whether advanced AI/ML <em>has</em> to be core to the product. The answer should only be yes if it would make a difference <em>in the eyes of the customers</em>. Using AI/ML too early is often a premature optimisation. You should assess whether the <a href=https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/>added complexity of dealing with MLOps</a> is justified.</p><p><strong>Q13: Are you planning to increase data intensity or advanced AI/ML use? Why?</strong> This question is similar to the one about the product roadmap, but specific to data/AI/ML. Again, the <em>Why</em> is key – ensure that there&rsquo;s a solid business case for increased data/AI/ML complexity. In a healthy startup, increased complexity is driven by customer need, not by excitement about shiny tech.</p><p><strong>Q14: Are any decisions routinely blocked or delayed by limited access to data?</strong> This question helps assess gaps in data collection and quality, as well as the company&rsquo;s culture around the use of data. It should also help you understand what sort of work is likely to be needed, e.g., even if there are plans to use more advanced AI/ML, the reality of data gaps may mean that plenty of data engineering work is needed.</p><h2 id=feedback-welcome>Feedback welcome<a hidden class=anchor aria-hidden=true href=#feedback-welcome>#</a></h2><p>If you found the above questions helpful or if you have any other feedback, I&rsquo;d love to hear from you. I&rsquo;m planning to evolve my Data-to-AI Health Check over time and post more on the other areas you should ask about. Subscribing for updates is the best way to get notified when it happens.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Business questions to ask before taking a startup data role on x" href="https://x.com/intent/tweet/?text=Business%20questions%20to%20ask%20before%20taking%20a%20startup%20data%20role&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f06%2fbusiness-questions-to-ask-before-taking-a-startup-data-role%2f&amp;hashtags=business%2ccareer%2cdatastrategy%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Business questions to ask before taking a startup data role on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f06%2fbusiness-questions-to-ask-before-taking-a-startup-data-role%2f&amp;title=Business%20questions%20to%20ask%20before%20taking%20a%20startup%20data%20role&amp;summary=Business%20questions%20to%20ask%20before%20taking%20a%20startup%20data%20role&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f06%2fbusiness-questions-to-ask-before-taking-a-startup-data-role%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Business questions to ask before taking a startup data role on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f06%2fbusiness-questions-to-ask-before-taking-a-startup-data-role%2f&title=Business%20questions%20to%20ask%20before%20taking%20a%20startup%20data%20role"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Business questions to ask before taking a startup data role on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f06%2fbusiness-questions-to-ask-before-taking-a-startup-data-role%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Business questions to ask before taking a startup data role on whatsapp" href="https://api.whatsapp.com/send?text=Business%20questions%20to%20ask%20before%20taking%20a%20startup%20data%20role%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f06%2fbusiness-questions-to-ask-before-taking-a-startup-data-role%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Business questions to ask before taking a startup data role on telegram" href="https://telegram.me/share/url?text=Business%20questions%20to%20ask%20before%20taking%20a%20startup%20data%20role&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f06%2fbusiness-questions-to-ask-before-taking-a-startup-data-role%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Business questions to ask before taking a startup data role on ycombinator" href="https://news.ycombinator.com/submitlink?t=Business%20questions%20to%20ask%20before%20taking%20a%20startup%20data%20role&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f06%2fbusiness-questions-to-ask-before-taking-a-startup-data-role%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="business,career,data strategy,startups"><meta name=description content="Fourteen questions that prospective employees should ask about a startup&rsquo;s business model and product, especially for data-focused roles."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Business questions to ask before taking a startup data role"><meta property="og:description" content="Fourteen questions that prospective employees should ask about a startup&rsquo;s business model and product, especially for data-focused roles."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/"><meta property="og:image" content="https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-05-06T04:30:00+00:00"><meta property="article:modified_time" content="2024-05-06T14:41:43+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark.webp"><meta name=twitter:title content="Business questions to ask before taking a startup data role"><meta name=twitter:description content="Fourteen questions that prospective employees should ask about a startup&rsquo;s business model and product, especially for data-focused roles."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Business questions to ask before taking a startup data role","item":"https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Business questions to ask before taking a startup data role","name":"Business questions to ask before taking a startup data role","description":"Fourteen questions that prospective employees should ask about a startup\u0026rsquo;s business model and product, especially for data-focused roles.","keywords":["business","career","data strategy","startups"],"articleBody":"If you join a startup as an early employee, you’re essentially an investor. But unlike capital investors, you can’t diversify your portfolio as a full-timer. You need assurances that your time investment is likely to yield a positive return. Ideally, this would be a better return than the return on any other use of your work time.\nGood startups guarantee a return for early employees by paying a competitive base salary. Typically, compensation also includes equity in the company, which vests over time. However, unlike equity in publicly-traded companies, there’s a good chance that your startup equity will be worthless.\nTo help you assess the value of your startup equity, this post presents questions from the Product \u0026 Business Model section of my Data-to-AI Health Check for Startups. In creating the health check, I realised I could have titled it “questions I should have asked past employers”, i.e., you get to learn from my mistakes! The health check has seven other areas, which I may cover in future posts.\nBefore proceeding, note these assumptions:\nYou don’t need to take a specific job offer urgently, i.e., you have multiple options or time to search for better options. If there’s urgency, you can take a suboptimal role, build your skills and savings, and aim to generate better options on your next search. You’re not seen as too junior to be asking deep questions. If you’re early in your career, consider if early-stage startups are for you: My view is that you may be better off working with an established company, where there are more structured mentorship opportunities. The startup isn’t proposing an unpaid equity-only position. If you’re expected to work full-time without a salary, you’re a founder. The startup is small enough for you to speak with the founders as part of the recruitment process (maybe \u003c100 employees). The founders aren’t refusing to answer your questions. If they are, move on. Investor-level product \u0026 business questions Most of the following questions are typically answered by a pitch deck, so asking the founders to take you through the pitch may be the most time-efficient way of getting answers. Depending on the stage of product development, you may also be able to gather some answers yourself from the company’s website.\nWhile these are investor-level questions, your assessment of the answers should be different from that of an investor. You don’t have dozens of other startups in your portfolio – only one. Better make sure it’s a good one.\nQ1: What is your company’s purpose? What problem are you solving and why? You’re looking for a plausible story and solution. It’s important to understand founder motivations, and assess whether you want to spend many of your waking hours bringing their vision to life.\nQ2: What does your product do? If the product is already live, a demo would be best to answer this. Otherwise, wireframes and other plans would be good enough. If the product isn’t live yet, watch out for unrealistic plans. You don’t want to work on a product that will never get released.\nQ3: What are the relevant market sizes (TAM/SAM/SOM)? TAM is total addressable market – the total market demand for the product, which helps assess the growth potential. SAM is serviceable addressable market – the market demand the product can plausibly fulfill, which helps assess revenue targets. SOM is serviceable obtainable market – the part of the market that the startup’s product can capture, which helps assess short-term growth potential.\nQ4: Where do the problem, market, and solution sit on Jason Cohen’s problem flowchart? While TAM/SAM/SOM are useful high-level metrics, the problem flowchart goes deeper into assessing the viability of a startup. A surprising number of startups skip such assessments, and fail as a result. You may regret joining startups that make such preventable blunders.\nQ5: What is on the product roadmap for the next 6-12-24 months? In startup-land, plans for 12-24 months are in the realm of wishful thinking, but it’s good to have an idea of the general direction. Knowing what’s on the roadmap for the next six months will help you assess whether you want to come on board.\nQ6: What is the business model, i.e., how do you make money? Together with the other questions, this will help you assess the viability of the business. You should be especially wary if the founders haven’t figured out how to generate revenue yet, which means they’ll have to raise money to keep paying you. If they’re not seeing healthy growth in other key metrics (e.g., number of active users), they’ll struggle to raise more funding.\nQ7: What is the competition? How’s your product differentiated in the eyes of customers? How hard is it for competitors to copy you? Founders should have solid knowledge of the competitive landscape, and be able to explain why customers choose their product over the competition – and why they’ll continue to do so. Steer clear of founders who exhibit a low understanding of customer wants and needs. The company’s value ultimately comes from making something people want.\nQ8: What are the key business metrics (definitions, values, and trajectories)? This is especially pertinent if you’re the type of data person who’s going to get deep into business metrics as part of your job (a data scientist/analyst, as opposed to a data/AI/ML engineer). But regardless of role, it’s important for you to know how the business is performing. You should be confident that startup executives are measuring the right things.\nQ9: Since the last raise, how has the company performed against its goals? This includes goals that are covered by the key business metrics, as well as product development milestones. Repeatedly failing to achieve self-imposed goals is often a red flag – the goals may be unrealistic, and the business may not be viable.\nQ10: How much runway is left before another raise is needed? This is critical for employees to know. For example, if there are only three months left before the startup runs out of money, you may be out of a job pretty quickly. Note that the question still applies if the startup is bootstrapped (i.e., self-funded or funded by revenue) – money needs to come from somewhere to cover your salary.\nData-to-AI product \u0026 business questions While the above questions should be asked by any early startup employee, you should also get answers for the following questions if you’re considering a data/AI/ML role. If you’re the first data hire, pay specific attention to answers that indicate that the startup isn’t ready for a data hire, or that you may have to wear hats you’re unwilling to wear. For example, if you’re passionate about advanced AI/ML modelling but there are gaps in data engineering and basic analytics, you’re likely to be the one doing the data work to address those gaps.\nQ11: What is the data intensity of the product on a scale of 1-5? High data intensity typically requires low-latency processing of large volumes of data with more than one database server. With high intensity, a slowdown in data processing would noticeably affect key business metrics. High data intensity means that solid data engineering skills are required for success – it’s important to ascertain that founders are aware of this requirement.\nQ12: Is advanced AI/ML core to the product? What if you used simple heuristics? One issue with AI/ML is the hype. AI is indeed transformative and exciting, but using AI isn’t always required for the product to succeed. In the words of Google’s first rule of ML: “Don’t be afraid to launch a product without machine learning”. As a data professional and an outsider, you are in a good position to assess whether advanced AI/ML has to be core to the product. The answer should only be yes if it would make a difference in the eyes of the customers. Using AI/ML too early is often a premature optimisation. You should assess whether the added complexity of dealing with MLOps is justified.\nQ13: Are you planning to increase data intensity or advanced AI/ML use? Why? This question is similar to the one about the product roadmap, but specific to data/AI/ML. Again, the Why is key – ensure that there’s a solid business case for increased data/AI/ML complexity. In a healthy startup, increased complexity is driven by customer need, not by excitement about shiny tech.\nQ14: Are any decisions routinely blocked or delayed by limited access to data? This question helps assess gaps in data collection and quality, as well as the company’s culture around the use of data. It should also help you understand what sort of work is likely to be needed, e.g., even if there are plans to use more advanced AI/ML, the reality of data gaps may mean that plenty of data engineering work is needed.\nFeedback welcome If you found the above questions helpful or if you have any other feedback, I’d love to hear from you. I’m planning to evolve my Data-to-AI Health Check over time and post more on the other areas you should ask about. Subscribing for updates is the best way to get notified when it happens.\n","wordCount":"1525","inLanguage":"en","image":"https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark.webp","datePublished":"2024-05-06T04:30:00Z","dateModified":"2024-05-06T14:41:43+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Business questions to ask before taking a startup data role</h1><div class=post-meta><span title='2024-05-06 04:30:00 +0000 UTC'>May 6, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark_hu379e0ee4dc47d3cccfc6c225f8b70eb3_8314_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark_hu379e0ee4dc47d3cccfc6c225f8b70eb3_8314_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark_hu379e0ee4dc47d3cccfc6c225f8b70eb3_8314_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark_hu379e0ee4dc47d3cccfc6c225f8b70eb3_8314_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/black-box-with-question-mark.webp alt="black box with a question mark" width=1200 height=630></figure><div class=post-content><p>If you join a startup as an early employee, you&rsquo;re essentially an investor. But unlike capital investors, you can&rsquo;t diversify your portfolio as a full-timer. You need assurances that your time investment is likely to yield a positive return. Ideally, this would be a better return than the return on any other use of your work time.</p><p>Good startups guarantee a return for early employees by paying a competitive base salary. Typically, compensation also includes equity in the company, which vests over time. However, unlike equity in publicly-traded companies, there&rsquo;s a good chance that your startup equity will be worthless.</p><p>To help you assess the value of your startup equity, this post presents questions from the Product & Business Model section of my <a href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/>Data-to-AI Health Check for Startups</a>. In creating the health check, I realised I could have titled it <em>&ldquo;questions I should have asked past employers&rdquo;</em>, i.e., you get to learn from my mistakes! The health check has seven other areas, which I may cover in future posts.</p><p>Before proceeding, note these assumptions:</p><ol><li>You don&rsquo;t need to take a specific job offer urgently, i.e., you have multiple options or time to search for better options. If there&rsquo;s urgency, you can take a suboptimal role, build your skills and savings, and aim to generate better options on your next search.</li><li>You&rsquo;re not seen as too junior to be asking deep questions. If you&rsquo;re early in your career, consider if early-stage startups are for you: My view is that you may be better off working with an established company, where there are more structured mentorship opportunities.</li><li>The startup isn&rsquo;t proposing an unpaid equity-only position. If you&rsquo;re expected to work full-time without a salary, you&rsquo;re a founder.</li><li>The startup is small enough for you to speak with the founders as part of the recruitment process (maybe &lt;100 employees).</li><li>The founders aren&rsquo;t refusing to answer your questions. If they are, move on.</li></ol><h2 id=investor-level-product--business-questions>Investor-level product & business questions<a hidden class=anchor aria-hidden=true href=#investor-level-product--business-questions>#</a></h2><p>Most of the following questions are typically answered by <a href=https://pitchdeckcoach.com/sequoia-capital-pitch-deck target=_blank rel=noopener>a pitch deck</a>, so asking the founders to take you through the pitch may be the most time-efficient way of getting answers. Depending on the stage of product development, you may also be able to gather some answers yourself from the company&rsquo;s website.</p><p>While these are investor-level questions, your assessment of the answers should be different from that of an investor. You don&rsquo;t have dozens of other startups in your portfolio – only one. Better make sure it&rsquo;s a good one.</p><p><strong>Q1: What is your company&rsquo;s purpose? What problem are you solving and why?</strong> You&rsquo;re looking for a plausible story and solution. It&rsquo;s important to understand founder motivations, and assess whether you want to spend many of your waking hours bringing their vision to life.</p><p><strong>Q2: What does your product do?</strong> If the product is already live, a demo would be best to answer this. Otherwise, wireframes and other plans would be good enough. If the product isn&rsquo;t live yet, watch out for unrealistic plans. You don&rsquo;t want to work on a product that will never get released.</p><p><strong>Q3: What are the relevant market sizes (TAM/SAM/SOM)?</strong> TAM is <em>total addressable market</em> – the total market demand for the product, which helps assess the growth potential. SAM is <em>serviceable addressable market</em> – the market demand the product can plausibly fulfill, which helps assess revenue targets. SOM is <em>serviceable obtainable market</em> – the part of the market that the startup&rsquo;s product can capture, which helps assess short-term growth potential.</p><p><strong>Q4: Where do the problem, market, and solution sit on <a href=https://longform.asmartbear.com/problem/ target=_blank rel=noopener>Jason Cohen&rsquo;s problem flowchart</a>?</strong> While TAM/SAM/SOM are useful high-level metrics, the problem flowchart goes deeper into assessing the viability of a startup. A surprising number of startups skip such assessments, and fail as a result. You may regret joining startups that make such <a href=https://longform.asmartbear.com/avoid-blundering/ target=_blank rel=noopener>preventable blunders</a>.</p><p><strong>Q5: What is on the product roadmap for the next 6-12-24 months?</strong> In startup-land, plans for 12-24 months are in the realm of wishful thinking, but it&rsquo;s good to have an idea of the general direction. Knowing what&rsquo;s on the roadmap for the next six months will help you assess whether you want to come on board.</p><p><strong>Q6: What is the business model, i.e., how do you make money?</strong> Together with the other questions, this will help you assess the viability of the business. You should be especially wary if the founders haven&rsquo;t figured out how to generate revenue yet, which means they&rsquo;ll have to raise money to keep paying you. If they&rsquo;re not seeing healthy growth in other key metrics (e.g., number of active users), they&rsquo;ll struggle to raise more funding.</p><p><strong>Q7: What is the competition? How&rsquo;s your product differentiated in the eyes of customers? How hard is it for competitors to copy you?</strong> Founders should have solid knowledge of the competitive landscape, and be able to explain why customers choose their product over the competition – and why they&rsquo;ll continue to do so. Steer clear of founders who exhibit a low understanding of customer wants and needs. The company&rsquo;s value ultimately comes from making something people want.</p><p><strong>Q8: What are the key business metrics (definitions, values, and trajectories)?</strong> This is especially pertinent if you&rsquo;re the type of data person who&rsquo;s going to get deep into business metrics as part of your job (a data scientist/analyst, as opposed to a data/AI/ML engineer). But regardless of role, it&rsquo;s important for you to know how the business is performing. You should be confident that startup executives are measuring the right things.</p><p><strong>Q9: Since the last raise, how has the company performed against its goals?</strong> This includes goals that are covered by the key business metrics, as well as product development milestones. Repeatedly failing to achieve self-imposed goals is often a red flag – the goals may be unrealistic, and the business may not be viable.</p><p><strong>Q10: How much runway is left before another raise is needed?</strong> This is critical for employees to know. For example, if there are only three months left before the startup runs out of money, you may be out of a job pretty quickly. Note that the question still applies if the startup is bootstrapped (i.e., self-funded or funded by revenue) – money needs to come from somewhere to cover your salary.</p><h2 id=data-to-ai-product--business-questions>Data-to-AI product & business questions<a hidden class=anchor aria-hidden=true href=#data-to-ai-product--business-questions>#</a></h2><p>While the above questions should be asked by any early startup employee, you should also get answers for the following questions if you&rsquo;re considering a data/AI/ML role. If you&rsquo;re <a href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/>the first data hire</a>, pay specific attention to answers that indicate that the startup isn&rsquo;t ready for a data hire, or that you may have to wear hats you&rsquo;re unwilling to wear. For example, if you&rsquo;re passionate about advanced AI/ML modelling but there are gaps in data engineering and basic analytics, you&rsquo;re likely to be the one doing the data work to address those gaps.</p><p><strong>Q11: What is the data intensity of the product on a scale of 1-5?</strong> High data intensity typically requires low-latency processing of large volumes of data with more than one database server. With high intensity, a slowdown in data processing would noticeably affect key business metrics. High data intensity means that solid data engineering skills are required for success – it&rsquo;s important to ascertain that founders are aware of this requirement.</p><p><strong>Q12: Is advanced AI/ML core to the product? What if you used simple heuristics?</strong> One issue with AI/ML is the hype. <a href=https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/>AI is indeed transformative and exciting</a>, but using AI isn&rsquo;t always required for the product to succeed. In the words of <a href=https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/>Google&rsquo;s first rule of ML</a>: <em>&ldquo;Don&rsquo;t be afraid to launch a product without machine learning&rdquo;</em>. As a data professional and an outsider, you are in a good position to assess whether advanced AI/ML <em>has</em> to be core to the product. The answer should only be yes if it would make a difference <em>in the eyes of the customers</em>. Using AI/ML too early is often a premature optimisation. You should assess whether the <a href=https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/>added complexity of dealing with MLOps</a> is justified.</p><p><strong>Q13: Are you planning to increase data intensity or advanced AI/ML use? Why?</strong> This question is similar to the one about the product roadmap, but specific to data/AI/ML. Again, the <em>Why</em> is key – ensure that there&rsquo;s a solid business case for increased data/AI/ML complexity. In a healthy startup, increased complexity is driven by customer need, not by excitement about shiny tech.</p><p><strong>Q14: Are any decisions routinely blocked or delayed by limited access to data?</strong> This question helps assess gaps in data collection and quality, as well as the company&rsquo;s culture around the use of data. It should also help you understand what sort of work is likely to be needed, e.g., even if there are plans to use more advanced AI/ML, the reality of data gaps may mean that plenty of data engineering work is needed.</p><h2 id=feedback-welcome>Feedback welcome<a hidden class=anchor aria-hidden=true href=#feedback-welcome>#</a></h2><p>If you found the above questions helpful or if you have any other feedback, I&rsquo;d love to hear from you. I&rsquo;m planning to evolve my Data-to-AI Health Check over time and post more on the other areas you should ask about. Subscribing for updates is the best way to get notified when it happens.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Business questions to ask before taking a startup data role on x" href="https://x.com/intent/tweet/?text=Business%20questions%20to%20ask%20before%20taking%20a%20startup%20data%20role&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f06%2fbusiness-questions-to-ask-before-taking-a-startup-data-role%2f&amp;hashtags=business%2ccareer%2cdatastrategy%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Business questions to ask before taking a startup data role on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f06%2fbusiness-questions-to-ask-before-taking-a-startup-data-role%2f&amp;title=Business%20questions%20to%20ask%20before%20taking%20a%20startup%20data%20role&amp;summary=Business%20questions%20to%20ask%20before%20taking%20a%20startup%20data%20role&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f06%2fbusiness-questions-to-ask-before-taking-a-startup-data-role%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Business questions to ask before taking a startup data role on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f06%2fbusiness-questions-to-ask-before-taking-a-startup-data-role%2f&title=Business%20questions%20to%20ask%20before%20taking%20a%20startup%20data%20role"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Business questions to ask before taking a startup data role on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f06%2fbusiness-questions-to-ask-before-taking-a-startup-data-role%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Business questions to ask before taking a startup data role on whatsapp" href="https://api.whatsapp.com/send?text=Business%20questions%20to%20ask%20before%20taking%20a%20startup%20data%20role%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f06%2fbusiness-questions-to-ask-before-taking-a-startup-data-role%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Business questions to ask before taking a startup data role on telegram" href="https://telegram.me/share/url?text=Business%20questions%20to%20ask%20before%20taking%20a%20startup%20data%20role&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f06%2fbusiness-questions-to-ask-before-taking-a-startup-data-role%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Business questions to ask before taking a startup data role on ycombinator" href="https://news.ycombinator.com/submitlink?t=Business%20questions%20to%20ask%20before%20taking%20a%20startup%20data%20role&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f06%2fbusiness-questions-to-ask-before-taking-a-startup-data-role%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/index.html b/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/index.html
index c84bd0354..66c9659af 100644
--- a/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/index.html
+++ b/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Probing the People aspects of an early-stage startup | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,career,data strategy,startups"><meta name=description content="Ten questions that prospective employees should ask about a startup&rsquo;s team, especially for data-centric roles."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Probing the People aspects of an early-stage startup"><meta property="og:description" content="Ten questions that prospective employees should ask about a startup&rsquo;s team, especially for data-centric roles."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/"><meta property="og:image" content="https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-05-13T02:00:00+00:00"><meta property="article:modified_time" content="2024-05-13T12:41:01+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions.webp"><meta name=twitter:title content="Probing the People aspects of an early-stage startup"><meta name=twitter:description content="Ten questions that prospective employees should ask about a startup&rsquo;s team, especially for data-centric roles."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Probing the People aspects of an early-stage startup","item":"https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Probing the People aspects of an early-stage startup","name":"Probing the People aspects of an early-stage startup","description":"Ten questions that prospective employees should ask about a startup\u0026rsquo;s team, especially for data-centric roles.","keywords":["business","career","data strategy","startups"],"articleBody":"A successful startup is fundamentally a group of people who execute well together. Building a viable product is key, but changing the product is easier than changing the founders. If you’re considering a role at an early-stage startup, you will be investing your best waking hours in the company. It’s important to assess the team before making such an investment.\nTo learn about the team, ask questions from the People section of my Data-to-AI Health Check for Startups. Some questions emphasise probing Data/AI/ML capabilities – an area that’s often misunderstood by non-specialists. However, this emphasis can be shifted to different functional areas as needed. Similarly to my previous post on scrutinising the Product \u0026 Business Model, the rest of this post lists my questions along with brief opinionated explanations.\nPeople questions Q1: Who are the founders? What are their skills and experience? Founders make or break a startup. It’s important to gain confidence that they have the skills and experience required to build the company, along with the mindset needed to keep learning and developing relevant skills. Founders who were previously successful are an especially positive sign – it indicates that they have the persistence and flexibility needed to build a business.\nQ2: What motivates the founders? How passionate are they about the startup’s problem space? My favourite founders are those who build a business based on their deep understanding of customer problems in an area they deeply care about. For example, I previously worked with Orkestra – a software-as-a-service startup that grew directly out of the founders’ experience as consultants. In Orkestra’s case, the founders had already spent years working together solving customer problems prior to founding the company. By contrast, some startups are founded by near-strangers just because the founders want to build something – a red flag.\nQ3: Have any founders left? Why and how? Startups can turn friends into foes. Foes holding a significant share of the company may lead to its destruction. But even in cases where founders leave on good terms without significant equity, their departure stories help understand founder personalities and the trajectory of the company. For example, if you’re considering a full-time position, and the story of how the remaining founders treated departing founders gives you pause, you’re better off working elsewhere.\nQ4: Who are the key employees? Early employees are almost as important to startup success as founders. In fact, I know of multiple cases where employees “became” founders even though they weren’t there from day one. For example, after my PhD I joined Giveable as a founding data scientist. As the first employee, I was in charge of building the backend for Giveable’s B2C gift recommendation web app. Due to market conditions, we pivoted to a B2B recommender-as-a-service offering – not what the original founder had envisioned. He decided to move on, and I was left with much more equity than originally planned, along with the rights to the code. While I could have kept going as a “founder”, I decided to use the codebase to continue building the same B2B product as part of a more established ecommerce startup.\nQ5: Have any key employees left (including involuntarily)? Why and how? With early employees being almost as important to startup success as founders, stories of their departures can be as informative as stories of founder departures. If you’re considering a startup job, these stories can tell you a lot about founder-employee dynamics, before you become an employee. If you’re especially thorough, you can even reach out to the former employees to get their side of the story. Positive signs include low employee turnover and founders who are comfortable with you speaking to their former employees.\nQ6: How committed are the founders and key employees (partly measured by work time spent on the startup)? Early on, it’s common for founders and employees to be involved on a part-time basis. This is fine, but if you’re going to commit a significant chunk of your time to the startup, you need to know who you’ll be working with. As an employee, you can usually ignore the big names that are listed as advisors on the startup’s website – their involvement is typically minimal. That said, both advisors and fractional contractors provide access to expertise and connections that may not be necessary on a full-time basis. In fact, fractional help is much better than premature hiring, which unnecessarily burns through funding. The main things to look at in an answer to the commitment question are: (1) transparency; and (2) that committed staff have the skills needed to achieve the next milestones.\nQ7: What hiring practices do you follow? How do you assess the skills of new experts (e.g., first data hire)? Given the importance of early employees, a loose hiring process is a cause for alarm. However, thoughtlessly borrowing hiring practices from the likes of Google is also problematic, as such processes are laughably hackable and tedious to everyone involved. Startups can and should move faster on hiring than established players: My favourite hiring processes include paid work on real problems after an initial low-cost filter. These are hard to scale, but there’s no need to scale hiring in the early days. Paying for work on real problems also helps address the challenge of assessing the skills of new experts – they are judged on real work output rather than on confidence, pedigree, and performance on convoluted tasks.\nQ8: Do you pay market rates? Startups that don’t pay market rates are best avoided. They’re unlikely to attract and retain quality employees. Founders of such startups may also fall victim to classic fallacies like the 1975 Mythical Man-Month, and make expensive mistakes like hiring two mediocre engineers in place of one excellent engineer. When it comes to software (and data) development, higher quality often incurs a lower overall cost. Paying market rates and hiring great people is the way to go, especially in the age of AI-powered interns.\nQ9: Are there any critical skill gaps among current personnel (especially around data/AI/ML)? If you’re asking this question as a candidate, you’re probably going to fill one of the gaps. However, gaps are relative to what the startup is trying to do. For example, if they have ambitious AI/ML plans that require a range of data skills they don’t have on the current team (from data engineering through data science to AI/ML engineering), they better be planning to hire more than one junior data generalist.\nQ10: What’s the hiring roadmap for the next 6-12-24 months? How will it affect the runway? Is it dependent on new funding or revenue growth? Startup founders usually have grand plans – that’s what you want from founders! But plans for 12-24 months are often in the realm of wishful thinking, and a lot can change even in six months. As a candidate, try to get a realistic view of the hiring that is highly likely to happen, along with the hiring that is dependent on new money coming in. Assuming that the latter doesn’t happen due to a cashflow crunch, would you still take the job?\nEven more questions? This post is part of a series on my Data-to-AI Health Check for Startups. Previous posts:\nAssessing a startup’s data-to-AI health: Overview and motivation Business questions to ask before taking a startup data role You can download a guide containing all the questions as a PDF. The next area I’ll cover is Culture – how people work together. Feedback is always welcome!\n","wordCount":"1250","inLanguage":"en","image":"https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions.webp","datePublished":"2024-05-13T02:00:00Z","dateModified":"2024-05-13T12:41:01+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Probing the People aspects of an early-stage startup</h1><div class=post-meta><span title='2024-05-13 02:00:00 +0000 UTC'>May 13, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions_hu1028389f734f7152b4990d59636e3cb8_84918_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions_hu1028389f734f7152b4990d59636e3cb8_84918_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions_hu1028389f734f7152b4990d59636e3cb8_84918_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions_hu1028389f734f7152b4990d59636e3cb8_84918_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions.webp alt="startupy people in a startupy space around a massive question mark (good old ChatGPT...)" width=1200 height=630></figure><div class=post-content><p>A successful startup is fundamentally a group of people who execute well together. Building a viable product is key, but changing the product is easier than changing the founders. If you&rsquo;re considering a role at an early-stage startup, you will be investing your best waking hours in the company. It&rsquo;s important to assess the team before making such an investment.</p><p>To learn about the team, ask questions from the People section of my <a href=https://yanirseroussi.com/data-to-ai-health-check/>Data-to-AI Health Check for Startups</a>. Some questions emphasise probing Data/AI/ML capabilities – an area that&rsquo;s often misunderstood by non-specialists. However, this emphasis can be shifted to different functional areas as needed. Similarly to <a href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/>my previous post on scrutinising the Product & Business Model</a>, the rest of this post lists my questions along with brief opinionated explanations.</p><h2 id=people-questions>People questions<a hidden class=anchor aria-hidden=true href=#people-questions>#</a></h2><p><strong>Q1: Who are the founders? What are their skills and experience?</strong> Founders make or break a startup. It&rsquo;s important to gain confidence that they have the skills and experience required to build the company, along with the mindset needed to keep learning and developing relevant skills. Founders who were previously successful are an especially positive sign – it indicates that they have the persistence and flexibility needed to build a business.</p><p><strong>Q2: What motivates the founders? How passionate are they about the startup&rsquo;s problem space?</strong> My favourite founders are those who build a business based on their deep understanding of customer problems in an area they deeply care about. For example, I previously worked with <a href=https://www.orkestra.energy/ target=_blank rel=noopener>Orkestra</a> – a software-as-a-service startup that grew directly out of the founders&rsquo; experience as consultants. In Orkestra&rsquo;s case, the founders had already spent years working together solving customer problems prior to founding the company. By contrast, some startups are founded by near-strangers just because the founders want to build <em>something</em> – a red flag.</p><p><strong>Q3: Have any founders left? Why and how?</strong> <a href=https://sparktoro.com/blog/the-final-chapter-of-my-first-startup/ target=_blank rel=noopener>Startups can turn friends into foes</a>. <a href=https://longform.asmartbear.com/avoid-blundering/ target=_blank rel=noopener>Foes holding a significant share of the company may lead to its destruction</a>. But even in cases where founders leave on good terms without significant equity, their departure stories help understand founder personalities and the trajectory of the company. For example, if you&rsquo;re considering a full-time position, and the story of how the remaining founders treated departing founders gives you pause, you&rsquo;re better off working elsewhere.</p><p><strong>Q4: Who are the key employees?</strong> Early employees are almost as important to startup success as founders. In fact, I know of multiple cases where employees &ldquo;became&rdquo; founders even though they weren&rsquo;t there from day one. For example, after my PhD I joined <a href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/>Giveable</a> as a founding data scientist. As the first employee, I was in charge of building the backend for Giveable&rsquo;s B2C gift recommendation web app. Due to market conditions, we pivoted to a B2B recommender-as-a-service offering – not what the original founder had envisioned. He decided to move on, and I was left with much more equity than originally planned, along with the rights to the code. While I could have kept going as a &ldquo;founder&rdquo;, <a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>I decided to use the codebase to continue building the same B2B product as part of a more established ecommerce startup</a>.</p><p><strong>Q5: Have any key employees left (including involuntarily)? Why and how?</strong> With early employees being almost as important to startup success as founders, stories of their departures can be as informative as stories of founder departures. If you&rsquo;re considering a startup job, these stories can tell you a lot about founder-employee dynamics, <em>before</em> you become an employee. If you&rsquo;re especially thorough, you can even reach out to the former employees to get their side of the story. Positive signs include low employee turnover and founders who are comfortable with you speaking to their former employees.</p><p><strong>Q6: How committed are the founders and key employees (partly measured by work time spent on the startup)?</strong> Early on, it&rsquo;s common for founders and employees to be involved on a part-time basis. This is fine, but if you&rsquo;re going to commit a significant chunk of your time to the startup, you need to know who you&rsquo;ll be working with. As an employee, you can usually ignore the big names that are listed as advisors on the startup&rsquo;s website – their involvement is typically minimal. That said, both advisors and fractional contractors provide access to expertise and connections that may not be necessary on a full-time basis. In fact, <a href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/>fractional help is much better than premature hiring, which unnecessarily burns through funding</a>. The main things to look at in an answer to the commitment question are: (1) transparency; and (2) that committed staff have the skills needed to achieve the next milestones.</p><p><strong>Q7: What hiring practices do you follow? How do you assess the skills of new experts (e.g., <a href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/>first data hire</a>)?</strong> Given the importance of early employees, a loose hiring process is a cause for alarm. However, thoughtlessly borrowing hiring practices from the likes of Google is also problematic, as <a href=https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/>such processes are laughably hackable</a> and tedious to everyone involved. Startups can and should move faster on hiring than established players: My favourite hiring processes include <a href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/>paid work on real problems</a> after an initial low-cost filter. These are hard to scale, but there&rsquo;s no need to scale hiring in the early days. Paying for work on real problems also helps address the challenge of assessing the skills of new experts – they are judged on real work output rather than on confidence, pedigree, and performance on convoluted tasks.</p><p><strong>Q8: Do you pay market rates?</strong> Startups that don&rsquo;t pay market rates are best avoided. They&rsquo;re unlikely to attract and retain quality employees. Founders of such startups may also fall victim to classic fallacies like <a href=https://en.wikipedia.org/wiki/The_Mythical_Man-Month target=_blank rel=noopener>the 1975 Mythical Man-Month</a>, and make expensive mistakes like hiring two mediocre engineers in place of one excellent engineer. When it comes to software (and data) development, <a href=https://martinfowler.com/articles/is-quality-worth-cost.html target=_blank rel=noopener>higher quality often incurs a lower overall cost</a>. Paying market rates and hiring great people is the way to go, especially in the age of <a href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/>AI-powered interns</a>.</p><p><strong>Q9: Are there any critical skill gaps among current personnel (especially around data/AI/ML)?</strong> If you&rsquo;re asking this question as a candidate, you&rsquo;re probably going to fill one of the gaps. However, gaps are relative to what the startup is trying to do. For example, if <a href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/>they have ambitious AI/ML plans</a> that require a range of data skills they don&rsquo;t have on the current team (from data engineering through data science to AI/ML engineering), they better be planning to hire more than one junior data generalist.</p><p><strong>Q10: What&rsquo;s the hiring roadmap for the next 6-12-24 months? How will it affect the runway? Is it dependent on new funding or revenue growth?</strong> Startup founders usually have grand plans – that&rsquo;s what you want from founders! But plans for 12-24 months are often in the realm of wishful thinking, and a lot can change even in six months. As a candidate, try to get a realistic view of the hiring that is highly likely to happen, along with the hiring that is dependent on new money coming in. Assuming that the latter doesn&rsquo;t happen due to a cashflow crunch, would you still take the job?</p><h2 id=even-more-questions>Even more questions?<a hidden class=anchor aria-hidden=true href=#even-more-questions>#</a></h2><p>This post is part of a series on <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. Previous posts:</p><ul><li><a href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/>Assessing a startup&rsquo;s data-to-AI health: Overview and motivation</a></li><li><a href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/>Business questions to ask before taking a startup data role</a></li></ul><p><a href=https://yanirseroussi.com/data-to-ai-health-check/>You can download a guide containing all the questions as a PDF</a>. The next area I&rsquo;ll cover is Culture – how people work together. Feedback is always welcome!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Probing the People aspects of an early-stage startup on x" href="https://x.com/intent/tweet/?text=Probing%20the%20People%20aspects%20of%20an%20early-stage%20startup&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f13%2fprobing-the-people-aspects-of-an-early-stage-startup%2f&amp;hashtags=business%2ccareer%2cdatastrategy%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Probing the People aspects of an early-stage startup on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f13%2fprobing-the-people-aspects-of-an-early-stage-startup%2f&amp;title=Probing%20the%20People%20aspects%20of%20an%20early-stage%20startup&amp;summary=Probing%20the%20People%20aspects%20of%20an%20early-stage%20startup&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f13%2fprobing-the-people-aspects-of-an-early-stage-startup%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Probing the People aspects of an early-stage startup on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f13%2fprobing-the-people-aspects-of-an-early-stage-startup%2f&title=Probing%20the%20People%20aspects%20of%20an%20early-stage%20startup"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Probing the People aspects of an early-stage startup on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f13%2fprobing-the-people-aspects-of-an-early-stage-startup%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Probing the People aspects of an early-stage startup on whatsapp" href="https://api.whatsapp.com/send?text=Probing%20the%20People%20aspects%20of%20an%20early-stage%20startup%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f13%2fprobing-the-people-aspects-of-an-early-stage-startup%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Probing the People aspects of an early-stage startup on telegram" href="https://telegram.me/share/url?text=Probing%20the%20People%20aspects%20of%20an%20early-stage%20startup&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f13%2fprobing-the-people-aspects-of-an-early-stage-startup%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Probing the People aspects of an early-stage startup on ycombinator" href="https://news.ycombinator.com/submitlink?t=Probing%20the%20People%20aspects%20of%20an%20early-stage%20startup&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f13%2fprobing-the-people-aspects-of-an-early-stage-startup%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="business,career,data strategy,startups"><meta name=description content="Ten questions that prospective employees should ask about a startup&rsquo;s team, especially for data-centric roles."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Probing the People aspects of an early-stage startup"><meta property="og:description" content="Ten questions that prospective employees should ask about a startup&rsquo;s team, especially for data-centric roles."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/"><meta property="og:image" content="https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-05-13T02:00:00+00:00"><meta property="article:modified_time" content="2024-05-13T12:41:01+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions.webp"><meta name=twitter:title content="Probing the People aspects of an early-stage startup"><meta name=twitter:description content="Ten questions that prospective employees should ask about a startup&rsquo;s team, especially for data-centric roles."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Probing the People aspects of an early-stage startup","item":"https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Probing the People aspects of an early-stage startup","name":"Probing the People aspects of an early-stage startup","description":"Ten questions that prospective employees should ask about a startup\u0026rsquo;s team, especially for data-centric roles.","keywords":["business","career","data strategy","startups"],"articleBody":"A successful startup is fundamentally a group of people who execute well together. Building a viable product is key, but changing the product is easier than changing the founders. If you’re considering a role at an early-stage startup, you will be investing your best waking hours in the company. It’s important to assess the team before making such an investment.\nTo learn about the team, ask questions from the People section of my Data-to-AI Health Check for Startups. Some questions emphasise probing Data/AI/ML capabilities – an area that’s often misunderstood by non-specialists. However, this emphasis can be shifted to different functional areas as needed. Similarly to my previous post on scrutinising the Product \u0026 Business Model, the rest of this post lists my questions along with brief opinionated explanations.\nPeople questions Q1: Who are the founders? What are their skills and experience? Founders make or break a startup. It’s important to gain confidence that they have the skills and experience required to build the company, along with the mindset needed to keep learning and developing relevant skills. Founders who were previously successful are an especially positive sign – it indicates that they have the persistence and flexibility needed to build a business.\nQ2: What motivates the founders? How passionate are they about the startup’s problem space? My favourite founders are those who build a business based on their deep understanding of customer problems in an area they deeply care about. For example, I previously worked with Orkestra – a software-as-a-service startup that grew directly out of the founders’ experience as consultants. In Orkestra’s case, the founders had already spent years working together solving customer problems prior to founding the company. By contrast, some startups are founded by near-strangers just because the founders want to build something – a red flag.\nQ3: Have any founders left? Why and how? Startups can turn friends into foes. Foes holding a significant share of the company may lead to its destruction. But even in cases where founders leave on good terms without significant equity, their departure stories help understand founder personalities and the trajectory of the company. For example, if you’re considering a full-time position, and the story of how the remaining founders treated departing founders gives you pause, you’re better off working elsewhere.\nQ4: Who are the key employees? Early employees are almost as important to startup success as founders. In fact, I know of multiple cases where employees “became” founders even though they weren’t there from day one. For example, after my PhD I joined Giveable as a founding data scientist. As the first employee, I was in charge of building the backend for Giveable’s B2C gift recommendation web app. Due to market conditions, we pivoted to a B2B recommender-as-a-service offering – not what the original founder had envisioned. He decided to move on, and I was left with much more equity than originally planned, along with the rights to the code. While I could have kept going as a “founder”, I decided to use the codebase to continue building the same B2B product as part of a more established ecommerce startup.\nQ5: Have any key employees left (including involuntarily)? Why and how? With early employees being almost as important to startup success as founders, stories of their departures can be as informative as stories of founder departures. If you’re considering a startup job, these stories can tell you a lot about founder-employee dynamics, before you become an employee. If you’re especially thorough, you can even reach out to the former employees to get their side of the story. Positive signs include low employee turnover and founders who are comfortable with you speaking to their former employees.\nQ6: How committed are the founders and key employees (partly measured by work time spent on the startup)? Early on, it’s common for founders and employees to be involved on a part-time basis. This is fine, but if you’re going to commit a significant chunk of your time to the startup, you need to know who you’ll be working with. As an employee, you can usually ignore the big names that are listed as advisors on the startup’s website – their involvement is typically minimal. That said, both advisors and fractional contractors provide access to expertise and connections that may not be necessary on a full-time basis. In fact, fractional help is much better than premature hiring, which unnecessarily burns through funding. The main things to look at in an answer to the commitment question are: (1) transparency; and (2) that committed staff have the skills needed to achieve the next milestones.\nQ7: What hiring practices do you follow? How do you assess the skills of new experts (e.g., first data hire)? Given the importance of early employees, a loose hiring process is a cause for alarm. However, thoughtlessly borrowing hiring practices from the likes of Google is also problematic, as such processes are laughably hackable and tedious to everyone involved. Startups can and should move faster on hiring than established players: My favourite hiring processes include paid work on real problems after an initial low-cost filter. These are hard to scale, but there’s no need to scale hiring in the early days. Paying for work on real problems also helps address the challenge of assessing the skills of new experts – they are judged on real work output rather than on confidence, pedigree, and performance on convoluted tasks.\nQ8: Do you pay market rates? Startups that don’t pay market rates are best avoided. They’re unlikely to attract and retain quality employees. Founders of such startups may also fall victim to classic fallacies like the 1975 Mythical Man-Month, and make expensive mistakes like hiring two mediocre engineers in place of one excellent engineer. When it comes to software (and data) development, higher quality often incurs a lower overall cost. Paying market rates and hiring great people is the way to go, especially in the age of AI-powered interns.\nQ9: Are there any critical skill gaps among current personnel (especially around data/AI/ML)? If you’re asking this question as a candidate, you’re probably going to fill one of the gaps. However, gaps are relative to what the startup is trying to do. For example, if they have ambitious AI/ML plans that require a range of data skills they don’t have on the current team (from data engineering through data science to AI/ML engineering), they better be planning to hire more than one junior data generalist.\nQ10: What’s the hiring roadmap for the next 6-12-24 months? How will it affect the runway? Is it dependent on new funding or revenue growth? Startup founders usually have grand plans – that’s what you want from founders! But plans for 12-24 months are often in the realm of wishful thinking, and a lot can change even in six months. As a candidate, try to get a realistic view of the hiring that is highly likely to happen, along with the hiring that is dependent on new money coming in. Assuming that the latter doesn’t happen due to a cashflow crunch, would you still take the job?\nEven more questions? This post is part of a series on my Data-to-AI Health Check for Startups. Previous posts:\nAssessing a startup’s data-to-AI health: Overview and motivation Business questions to ask before taking a startup data role You can download a guide containing all the questions as a PDF. The next area I’ll cover is Culture – how people work together. Feedback is always welcome!\n","wordCount":"1250","inLanguage":"en","image":"https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions.webp","datePublished":"2024-05-13T02:00:00Z","dateModified":"2024-05-13T12:41:01+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Probing the People aspects of an early-stage startup</h1><div class=post-meta><span title='2024-05-13 02:00:00 +0000 UTC'>May 13, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions_hu1028389f734f7152b4990d59636e3cb8_84918_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions_hu1028389f734f7152b4990d59636e3cb8_84918_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions_hu1028389f734f7152b4990d59636e3cb8_84918_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions_hu1028389f734f7152b4990d59636e3cb8_84918_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/startup-people-questions.webp alt="startupy people in a startupy space around a massive question mark (good old ChatGPT...)" width=1200 height=630></figure><div class=post-content><p>A successful startup is fundamentally a group of people who execute well together. Building a viable product is key, but changing the product is easier than changing the founders. If you&rsquo;re considering a role at an early-stage startup, you will be investing your best waking hours in the company. It&rsquo;s important to assess the team before making such an investment.</p><p>To learn about the team, ask questions from the People section of my <a href=https://yanirseroussi.com/data-to-ai-health-check/>Data-to-AI Health Check for Startups</a>. Some questions emphasise probing Data/AI/ML capabilities – an area that&rsquo;s often misunderstood by non-specialists. However, this emphasis can be shifted to different functional areas as needed. Similarly to <a href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/>my previous post on scrutinising the Product & Business Model</a>, the rest of this post lists my questions along with brief opinionated explanations.</p><h2 id=people-questions>People questions<a hidden class=anchor aria-hidden=true href=#people-questions>#</a></h2><p><strong>Q1: Who are the founders? What are their skills and experience?</strong> Founders make or break a startup. It&rsquo;s important to gain confidence that they have the skills and experience required to build the company, along with the mindset needed to keep learning and developing relevant skills. Founders who were previously successful are an especially positive sign – it indicates that they have the persistence and flexibility needed to build a business.</p><p><strong>Q2: What motivates the founders? How passionate are they about the startup&rsquo;s problem space?</strong> My favourite founders are those who build a business based on their deep understanding of customer problems in an area they deeply care about. For example, I previously worked with <a href=https://www.orkestra.energy/ target=_blank rel=noopener>Orkestra</a> – a software-as-a-service startup that grew directly out of the founders&rsquo; experience as consultants. In Orkestra&rsquo;s case, the founders had already spent years working together solving customer problems prior to founding the company. By contrast, some startups are founded by near-strangers just because the founders want to build <em>something</em> – a red flag.</p><p><strong>Q3: Have any founders left? Why and how?</strong> <a href=https://sparktoro.com/blog/the-final-chapter-of-my-first-startup/ target=_blank rel=noopener>Startups can turn friends into foes</a>. <a href=https://longform.asmartbear.com/avoid-blundering/ target=_blank rel=noopener>Foes holding a significant share of the company may lead to its destruction</a>. But even in cases where founders leave on good terms without significant equity, their departure stories help understand founder personalities and the trajectory of the company. For example, if you&rsquo;re considering a full-time position, and the story of how the remaining founders treated departing founders gives you pause, you&rsquo;re better off working elsewhere.</p><p><strong>Q4: Who are the key employees?</strong> Early employees are almost as important to startup success as founders. In fact, I know of multiple cases where employees &ldquo;became&rdquo; founders even though they weren&rsquo;t there from day one. For example, after my PhD I joined <a href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/>Giveable</a> as a founding data scientist. As the first employee, I was in charge of building the backend for Giveable&rsquo;s B2C gift recommendation web app. Due to market conditions, we pivoted to a B2B recommender-as-a-service offering – not what the original founder had envisioned. He decided to move on, and I was left with much more equity than originally planned, along with the rights to the code. While I could have kept going as a &ldquo;founder&rdquo;, <a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>I decided to use the codebase to continue building the same B2B product as part of a more established ecommerce startup</a>.</p><p><strong>Q5: Have any key employees left (including involuntarily)? Why and how?</strong> With early employees being almost as important to startup success as founders, stories of their departures can be as informative as stories of founder departures. If you&rsquo;re considering a startup job, these stories can tell you a lot about founder-employee dynamics, <em>before</em> you become an employee. If you&rsquo;re especially thorough, you can even reach out to the former employees to get their side of the story. Positive signs include low employee turnover and founders who are comfortable with you speaking to their former employees.</p><p><strong>Q6: How committed are the founders and key employees (partly measured by work time spent on the startup)?</strong> Early on, it&rsquo;s common for founders and employees to be involved on a part-time basis. This is fine, but if you&rsquo;re going to commit a significant chunk of your time to the startup, you need to know who you&rsquo;ll be working with. As an employee, you can usually ignore the big names that are listed as advisors on the startup&rsquo;s website – their involvement is typically minimal. That said, both advisors and fractional contractors provide access to expertise and connections that may not be necessary on a full-time basis. In fact, <a href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/>fractional help is much better than premature hiring, which unnecessarily burns through funding</a>. The main things to look at in an answer to the commitment question are: (1) transparency; and (2) that committed staff have the skills needed to achieve the next milestones.</p><p><strong>Q7: What hiring practices do you follow? How do you assess the skills of new experts (e.g., <a href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/>first data hire</a>)?</strong> Given the importance of early employees, a loose hiring process is a cause for alarm. However, thoughtlessly borrowing hiring practices from the likes of Google is also problematic, as <a href=https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/>such processes are laughably hackable</a> and tedious to everyone involved. Startups can and should move faster on hiring than established players: My favourite hiring processes include <a href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/>paid work on real problems</a> after an initial low-cost filter. These are hard to scale, but there&rsquo;s no need to scale hiring in the early days. Paying for work on real problems also helps address the challenge of assessing the skills of new experts – they are judged on real work output rather than on confidence, pedigree, and performance on convoluted tasks.</p><p><strong>Q8: Do you pay market rates?</strong> Startups that don&rsquo;t pay market rates are best avoided. They&rsquo;re unlikely to attract and retain quality employees. Founders of such startups may also fall victim to classic fallacies like <a href=https://en.wikipedia.org/wiki/The_Mythical_Man-Month target=_blank rel=noopener>the 1975 Mythical Man-Month</a>, and make expensive mistakes like hiring two mediocre engineers in place of one excellent engineer. When it comes to software (and data) development, <a href=https://martinfowler.com/articles/is-quality-worth-cost.html target=_blank rel=noopener>higher quality often incurs a lower overall cost</a>. Paying market rates and hiring great people is the way to go, especially in the age of <a href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/>AI-powered interns</a>.</p><p><strong>Q9: Are there any critical skill gaps among current personnel (especially around data/AI/ML)?</strong> If you&rsquo;re asking this question as a candidate, you&rsquo;re probably going to fill one of the gaps. However, gaps are relative to what the startup is trying to do. For example, if <a href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/>they have ambitious AI/ML plans</a> that require a range of data skills they don&rsquo;t have on the current team (from data engineering through data science to AI/ML engineering), they better be planning to hire more than one junior data generalist.</p><p><strong>Q10: What&rsquo;s the hiring roadmap for the next 6-12-24 months? How will it affect the runway? Is it dependent on new funding or revenue growth?</strong> Startup founders usually have grand plans – that&rsquo;s what you want from founders! But plans for 12-24 months are often in the realm of wishful thinking, and a lot can change even in six months. As a candidate, try to get a realistic view of the hiring that is highly likely to happen, along with the hiring that is dependent on new money coming in. Assuming that the latter doesn&rsquo;t happen due to a cashflow crunch, would you still take the job?</p><h2 id=even-more-questions>Even more questions?<a hidden class=anchor aria-hidden=true href=#even-more-questions>#</a></h2><p>This post is part of a series on <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. Previous posts:</p><ul><li><a href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/>Assessing a startup&rsquo;s data-to-AI health: Overview and motivation</a></li><li><a href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/>Business questions to ask before taking a startup data role</a></li></ul><p><a href=https://yanirseroussi.com/data-to-ai-health-check/>You can download a guide containing all the questions as a PDF</a>. The next area I&rsquo;ll cover is Culture – how people work together. Feedback is always welcome!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Probing the People aspects of an early-stage startup on x" href="https://x.com/intent/tweet/?text=Probing%20the%20People%20aspects%20of%20an%20early-stage%20startup&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f13%2fprobing-the-people-aspects-of-an-early-stage-startup%2f&amp;hashtags=business%2ccareer%2cdatastrategy%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Probing the People aspects of an early-stage startup on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f13%2fprobing-the-people-aspects-of-an-early-stage-startup%2f&amp;title=Probing%20the%20People%20aspects%20of%20an%20early-stage%20startup&amp;summary=Probing%20the%20People%20aspects%20of%20an%20early-stage%20startup&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f13%2fprobing-the-people-aspects-of-an-early-stage-startup%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Probing the People aspects of an early-stage startup on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f13%2fprobing-the-people-aspects-of-an-early-stage-startup%2f&title=Probing%20the%20People%20aspects%20of%20an%20early-stage%20startup"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Probing the People aspects of an early-stage startup on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f13%2fprobing-the-people-aspects-of-an-early-stage-startup%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Probing the People aspects of an early-stage startup on whatsapp" href="https://api.whatsapp.com/send?text=Probing%20the%20People%20aspects%20of%20an%20early-stage%20startup%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f13%2fprobing-the-people-aspects-of-an-early-stage-startup%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Probing the People aspects of an early-stage startup on telegram" href="https://telegram.me/share/url?text=Probing%20the%20People%20aspects%20of%20an%20early-stage%20startup&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f13%2fprobing-the-people-aspects-of-an-early-stage-startup%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Probing the People aspects of an early-stage startup on ycombinator" href="https://news.ycombinator.com/submitlink?t=Probing%20the%20People%20aspects%20of%20an%20early-stage%20startup&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f13%2fprobing-the-people-aspects-of-an-early-stage-startup%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/index.html b/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/index.html
index 37590fc47..b8551ac1c 100644
--- a/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/index.html
+++ b/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Question startup culture before accepting a data-to-AI role | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,career,data strategy,startups"><meta name=description content="Eight questions that prospective data-to-AI employees should ask about a startup&rsquo;s work and data culture."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Question startup culture before accepting a data-to-AI role"><meta property="og:description" content="Eight questions that prospective data-to-AI employees should ask about a startup&rsquo;s work and data culture."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/"><meta property="og:image" content="https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-05-20T02:25:00+00:00"><meta property="article:modified_time" content="2024-05-21T17:08:32+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out.webp"><meta name=twitter:title content="Question startup culture before accepting a data-to-AI role"><meta name=twitter:description content="Eight questions that prospective data-to-AI employees should ask about a startup&rsquo;s work and data culture."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Question startup culture before accepting a data-to-AI role","item":"https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Question startup culture before accepting a data-to-AI role","name":"Question startup culture before accepting a data-to-AI role","description":"Eight questions that prospective data-to-AI employees should ask about a startup\u0026rsquo;s work and data culture.","keywords":["business","career","data strategy","startups"],"articleBody":"AI shmAI. If you’ve been paying any attention, you’d know that for the vast majority of AI/ML projects, the real value comes from the data.\nAnd if you’re considering a role with a startup that has grand AI plans, you’d better learn about the startup’s work culture – which includes its data culture.\nTo help you, this post discusses the questions from the Culture section of my Data-to-AI Health Check for Startups. Let’s jump right into it.\nQ1: How often are people expected to work outside normal business hours (founders included)? Is unreasonable overtime compensated? “It’s a startup” isn’t a valid excuse for constant overwork. Most successful startups require a sustained effort over many years, and working at capacity reduces productivity over time. That said, high-effort spikes are inevitable – but unusual efforts should be recognised and compensated.\nQ2: Do people go on leave regularly (founders included)? This probes a similar cultural norm around overwork to Q1. Stay away from places where people never go on leave. It leads to burnout and collective stupidity: Knowledge workers need downtime to take a step back and come up with new creative ideas. Humans are not AIs.\nQ3: How do employees view the leadership team and founders? A small startup won’t have significant quantitative data on employee views. Even at larger companies, employee surveys are often designed and administered in a way that masks problems. If you’re considering a role with a startup, ask to speak with current employees to learn about their views on the culture, founders, and the company’s prospects. If the company is established, sites like Glassdoor and Blind can help you probe issues beyond the current employee base. In any case, remember that there’s an inherent selection bias in only questioning current staff, which is why it’s worth learning about former employees and ex-founders.\nQ4: How are wins celebrated? How are failures and mistakes analysed? The unfortunate reality is that many startup founders have little experience running or working at a startup. Therefore, they may not appreciate the need to celebrate wins or to take time to learn from failures. Rather than asking about general rituals, you could ask for examples: What did you do after the last big release? What did you learn from the latest outage? How will you mitigate similar outages?\nQ5: How is excellent/poor individual performance evaluated and handled? If you’ve worked anywhere at any capacity, you’d know that the following is true: (1) underappreciated excellent employees may leave; and (2) poor performers may drag an entire team down. Before you join a growing startup, it pays to know that founders have put some thought into performance management – especially if excellence is one of your core values.\nQ6: Does the company run data-informed experiments (like A/B tests)? If so, what is considered a successful experiment? For example, what happens if a well-run experiment produces results that contradict the CEO’s opinion? Finally, a question that directly addresses data culture! If you are considering a data role, a culture of intelligent experimentation is a positive sign that the startup is right for you. The correct definition of a successful experiment is “an experiment that taught us something new”. The common answer of “an experiment that confirmed our preconceived notions” (or in A/B testing terms: “an experiment where we shipped the test variation”) is absolutely wrong.\nQ7: Do leaders at the company explicitly seek truthful data, even when the truth may expose their mistakes? As with Q4, this may be best probed by asking leaders for examples of cases where they uncovered data that proved them wrong. Startups that harbour a culture of hiding from bad news are best avoided by excellent data people. In my experience and based on countless stories by friends, avoidance of bad news and truthful data becomes more common as companies grow. Great startup leaders care about the success of their business and know that hiding from the truth isn’t going to make it disappear.\nQ8: How is uncertainty quantified and communicated? How does it affect decisions? Common sources of uncertainty include sampling biases and missing or wrong data. Marketers are especially notorious for ignoring uncertainty for the sake of memorability (“nine out of ten doctors agree…”). But ignoring uncertainty has long been a way of getting data driven off a cliff. This is at the core of why I recommend the Calling Bullshit book and course to any aspiring data professional. Don’t work with startups that exhibit bullshit failure modes and ignore uncertainty – unless you have the mandate to shape the data culture for the better.\nData-to-AI health beyond culture This post is part of a series on my Data-to-AI Health Check for Startups. Previous posts:\nAssessing a startup’s data-to-AI health: Overview and motivation Business questions to ask before taking a startup data role Probing the People aspects of an early-stage startup You can download a guide containing all the questions as a PDF. The next area I’ll cover is Processes \u0026 Project Management – aspects of delivery that are more formal than the somewhat-intangible Culture. Feedback is always welcome!\n","wordCount":"850","inLanguage":"en","image":"https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out.webp","datePublished":"2024-05-20T02:25:00Z","dateModified":"2024-05-21T17:08:32+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Question startup culture before accepting a data-to-AI role</h1><div class=post-meta><span title='2024-05-20 02:25:00 +0000 UTC'>May 20, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out_huc8c26eba9e1f31eddb05f21aa203c416_168308_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out_huc8c26eba9e1f31eddb05f21aa203c416_168308_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out_huc8c26eba9e1f31eddb05f21aa203c416_168308_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out_huc8c26eba9e1f31eddb05f21aa203c416_168308_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out.webp alt="an illustration of the data 'garbage in, garbage out' concept" width=1200 height=630></figure><div class=post-content><p>AI shmAI. If you&rsquo;ve been paying any attention, you&rsquo;d know that for the vast majority of AI/ML projects, the real value comes from the data.</p><p>And if you&rsquo;re considering a role with a startup that has grand AI plans, you&rsquo;d better learn about the startup&rsquo;s work culture – which includes its data culture.</p><p>To help you, this post discusses the questions from the Culture section of <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. Let&rsquo;s jump right into it.</p><p><strong>Q1: How often are people expected to work outside normal business hours (founders included)? Is unreasonable overtime compensated?</strong> <em>&ldquo;It&rsquo;s a startup&rdquo;</em> isn&rsquo;t a valid excuse for constant overwork. Most successful startups require a sustained effort over many years, and <a href=https://longform.asmartbear.com/utilization/ target=_blank rel=noopener>working at capacity reduces productivity over time</a>. That said, high-effort spikes are inevitable – but unusual efforts should be recognised and compensated.</p><p><strong>Q2: Do people go on leave regularly (founders included)?</strong> This probes a similar cultural norm around overwork to Q1. Stay away from places where people never go on leave. It leads to burnout and collective stupidity: Knowledge workers <em>need</em> downtime to take a step back and come up with new creative ideas. Humans are not AIs.</p><p><strong>Q3: How do employees view the leadership team and founders?</strong> A small startup won&rsquo;t have significant quantitative data on employee views. Even at larger companies, employee surveys are often designed and administered in a way that masks problems. If you&rsquo;re considering a role with a startup, ask to speak with current employees to learn about their views on the culture, founders, and the company&rsquo;s prospects. If the company is established, sites like Glassdoor and Blind can help you probe issues beyond the current employee base. In any case, remember that there&rsquo;s an inherent selection bias in only questioning current staff, which is why it&rsquo;s worth <a href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/>learning about former employees and ex-founders</a>.</p><p><strong>Q4: How are wins celebrated? How are failures and mistakes analysed?</strong> The unfortunate reality is that many startup founders have little experience running or working at a startup. Therefore, they may not appreciate the need to celebrate wins or to take time to learn from failures. Rather than asking about general rituals, you could ask for examples: <em>What did you do after the last big release? What did you learn from the latest outage? How will you mitigate similar outages?</em></p><p><strong>Q5: How is excellent/poor individual performance evaluated and handled?</strong> If you&rsquo;ve worked anywhere at any capacity, you&rsquo;d know that the following is true: (1) underappreciated excellent employees may leave; and (2) poor performers may drag an entire team down. Before you join a growing startup, it pays to know that founders have put some thought into performance management – especially if excellence is one of your core values.</p><p><strong>Q6: Does the company run data-informed experiments (like A/B tests)? If so, what is considered a successful experiment? For example, what happens if a well-run experiment produces results that contradict the CEO&rsquo;s opinion?</strong> Finally, a question that directly addresses data culture! If you are considering a data role, a culture of intelligent experimentation is a positive sign that the startup is right for you. The correct definition of a <em>successful experiment</em> is <em>&ldquo;an experiment that taught us something new&rdquo;</em>. The common answer of <em>&ldquo;an experiment that confirmed our preconceived notions&rdquo;</em> (or in A/B testing terms: <em>&ldquo;an experiment where we shipped the test variation&rdquo;</em>) is absolutely wrong.</p><p><strong>Q7: Do leaders at the company explicitly seek truthful data, even when the truth may expose their mistakes?</strong> As with Q4, this may be best probed by asking leaders for examples of cases where they uncovered data that proved them wrong. Startups that harbour a culture of hiding from bad news are best avoided by excellent data people. In my experience and based on countless stories by friends, avoidance of bad news and truthful data becomes more common as companies grow. Great startup leaders care about the success of their business and know that hiding from the truth isn&rsquo;t going to make it disappear.</p><p><strong>Q8: How is uncertainty quantified and communicated? How does it affect decisions?</strong> Common sources of uncertainty include sampling biases and missing or wrong data. Marketers are especially notorious for ignoring uncertainty for the sake of memorability (<em><a href=https://tvtropes.org/pmwiki/pmwiki.php/Main/NineOutOfTenDoctorsAgree target=_blank rel=noopener>&ldquo;nine out of ten doctors agree&mldr;&rdquo;</a></em>). But ignoring uncertainty has long been a way of <a href=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/>getting data driven off a cliff</a>. This is at the core of why I recommend <a href=https://callingbullshit.org/ target=_blank rel=noopener>the Calling Bullshit book and course</a> to <a href=https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/>any aspiring data professional</a>. Don&rsquo;t work with startups that exhibit bullshit failure modes and ignore uncertainty – unless you have the mandate to shape the data culture for the better.</p><h2 id=data-to-ai-health-beyond-culture>Data-to-AI health beyond culture<a hidden class=anchor aria-hidden=true href=#data-to-ai-health-beyond-culture>#</a></h2><p>This post is part of a series on <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. Previous posts:</p><ul><li><a href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/>Assessing a startup&rsquo;s data-to-AI health: Overview and motivation</a></li><li><a href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/>Business questions to ask before taking a startup data role</a></li><li><a href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/>Probing the People aspects of an early-stage startup</a></li></ul><p><a href=https://yanirseroussi.com/data-to-ai-health-check/>You can download a guide containing all the questions as a PDF</a>. The next area I&rsquo;ll cover is Processes & Project Management – aspects of delivery that are more formal than the somewhat-intangible Culture. Feedback is always welcome!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Question startup culture before accepting a data-to-AI role on x" href="https://x.com/intent/tweet/?text=Question%20startup%20culture%20before%20accepting%20a%20data-to-AI%20role&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f20%2fquestion-startup-culture-before-accepting-a-data-to-ai-role%2f&amp;hashtags=business%2ccareer%2cdatastrategy%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Question startup culture before accepting a data-to-AI role on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f20%2fquestion-startup-culture-before-accepting-a-data-to-ai-role%2f&amp;title=Question%20startup%20culture%20before%20accepting%20a%20data-to-AI%20role&amp;summary=Question%20startup%20culture%20before%20accepting%20a%20data-to-AI%20role&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f20%2fquestion-startup-culture-before-accepting-a-data-to-ai-role%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Question startup culture before accepting a data-to-AI role on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f20%2fquestion-startup-culture-before-accepting-a-data-to-ai-role%2f&title=Question%20startup%20culture%20before%20accepting%20a%20data-to-AI%20role"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Question startup culture before accepting a data-to-AI role on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f20%2fquestion-startup-culture-before-accepting-a-data-to-ai-role%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Question startup culture before accepting a data-to-AI role on whatsapp" href="https://api.whatsapp.com/send?text=Question%20startup%20culture%20before%20accepting%20a%20data-to-AI%20role%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f20%2fquestion-startup-culture-before-accepting-a-data-to-ai-role%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Question startup culture before accepting a data-to-AI role on telegram" href="https://telegram.me/share/url?text=Question%20startup%20culture%20before%20accepting%20a%20data-to-AI%20role&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f20%2fquestion-startup-culture-before-accepting-a-data-to-ai-role%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Question startup culture before accepting a data-to-AI role on ycombinator" href="https://news.ycombinator.com/submitlink?t=Question%20startup%20culture%20before%20accepting%20a%20data-to-AI%20role&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f20%2fquestion-startup-culture-before-accepting-a-data-to-ai-role%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="business,career,data strategy,startups"><meta name=description content="Eight questions that prospective data-to-AI employees should ask about a startup&rsquo;s work and data culture."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Question startup culture before accepting a data-to-AI role"><meta property="og:description" content="Eight questions that prospective data-to-AI employees should ask about a startup&rsquo;s work and data culture."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/"><meta property="og:image" content="https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-05-20T02:25:00+00:00"><meta property="article:modified_time" content="2024-05-21T17:08:32+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out.webp"><meta name=twitter:title content="Question startup culture before accepting a data-to-AI role"><meta name=twitter:description content="Eight questions that prospective data-to-AI employees should ask about a startup&rsquo;s work and data culture."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Question startup culture before accepting a data-to-AI role","item":"https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Question startup culture before accepting a data-to-AI role","name":"Question startup culture before accepting a data-to-AI role","description":"Eight questions that prospective data-to-AI employees should ask about a startup\u0026rsquo;s work and data culture.","keywords":["business","career","data strategy","startups"],"articleBody":"AI shmAI. If you’ve been paying any attention, you’d know that for the vast majority of AI/ML projects, the real value comes from the data.\nAnd if you’re considering a role with a startup that has grand AI plans, you’d better learn about the startup’s work culture – which includes its data culture.\nTo help you, this post discusses the questions from the Culture section of my Data-to-AI Health Check for Startups. Let’s jump right into it.\nQ1: How often are people expected to work outside normal business hours (founders included)? Is unreasonable overtime compensated? “It’s a startup” isn’t a valid excuse for constant overwork. Most successful startups require a sustained effort over many years, and working at capacity reduces productivity over time. That said, high-effort spikes are inevitable – but unusual efforts should be recognised and compensated.\nQ2: Do people go on leave regularly (founders included)? This probes a similar cultural norm around overwork to Q1. Stay away from places where people never go on leave. It leads to burnout and collective stupidity: Knowledge workers need downtime to take a step back and come up with new creative ideas. Humans are not AIs.\nQ3: How do employees view the leadership team and founders? A small startup won’t have significant quantitative data on employee views. Even at larger companies, employee surveys are often designed and administered in a way that masks problems. If you’re considering a role with a startup, ask to speak with current employees to learn about their views on the culture, founders, and the company’s prospects. If the company is established, sites like Glassdoor and Blind can help you probe issues beyond the current employee base. In any case, remember that there’s an inherent selection bias in only questioning current staff, which is why it’s worth learning about former employees and ex-founders.\nQ4: How are wins celebrated? How are failures and mistakes analysed? The unfortunate reality is that many startup founders have little experience running or working at a startup. Therefore, they may not appreciate the need to celebrate wins or to take time to learn from failures. Rather than asking about general rituals, you could ask for examples: What did you do after the last big release? What did you learn from the latest outage? How will you mitigate similar outages?\nQ5: How is excellent/poor individual performance evaluated and handled? If you’ve worked anywhere at any capacity, you’d know that the following is true: (1) underappreciated excellent employees may leave; and (2) poor performers may drag an entire team down. Before you join a growing startup, it pays to know that founders have put some thought into performance management – especially if excellence is one of your core values.\nQ6: Does the company run data-informed experiments (like A/B tests)? If so, what is considered a successful experiment? For example, what happens if a well-run experiment produces results that contradict the CEO’s opinion? Finally, a question that directly addresses data culture! If you are considering a data role, a culture of intelligent experimentation is a positive sign that the startup is right for you. The correct definition of a successful experiment is “an experiment that taught us something new”. The common answer of “an experiment that confirmed our preconceived notions” (or in A/B testing terms: “an experiment where we shipped the test variation”) is absolutely wrong.\nQ7: Do leaders at the company explicitly seek truthful data, even when the truth may expose their mistakes? As with Q4, this may be best probed by asking leaders for examples of cases where they uncovered data that proved them wrong. Startups that harbour a culture of hiding from bad news are best avoided by excellent data people. In my experience and based on countless stories by friends, avoidance of bad news and truthful data becomes more common as companies grow. Great startup leaders care about the success of their business and know that hiding from the truth isn’t going to make it disappear.\nQ8: How is uncertainty quantified and communicated? How does it affect decisions? Common sources of uncertainty include sampling biases and missing or wrong data. Marketers are especially notorious for ignoring uncertainty for the sake of memorability (“nine out of ten doctors agree…”). But ignoring uncertainty has long been a way of getting data driven off a cliff. This is at the core of why I recommend the Calling Bullshit book and course to any aspiring data professional. Don’t work with startups that exhibit bullshit failure modes and ignore uncertainty – unless you have the mandate to shape the data culture for the better.\nData-to-AI health beyond culture This post is part of a series on my Data-to-AI Health Check for Startups. Previous posts:\nAssessing a startup’s data-to-AI health: Overview and motivation Business questions to ask before taking a startup data role Probing the People aspects of an early-stage startup You can download a guide containing all the questions as a PDF. The next area I’ll cover is Processes \u0026 Project Management – aspects of delivery that are more formal than the somewhat-intangible Culture. Feedback is always welcome!\n","wordCount":"850","inLanguage":"en","image":"https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out.webp","datePublished":"2024-05-20T02:25:00Z","dateModified":"2024-05-21T17:08:32+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Question startup culture before accepting a data-to-AI role</h1><div class=post-meta><span title='2024-05-20 02:25:00 +0000 UTC'>May 20, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out_huc8c26eba9e1f31eddb05f21aa203c416_168308_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out_huc8c26eba9e1f31eddb05f21aa203c416_168308_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out_huc8c26eba9e1f31eddb05f21aa203c416_168308_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out_huc8c26eba9e1f31eddb05f21aa203c416_168308_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/data-garbage-in-garbage-out.webp alt="an illustration of the data 'garbage in, garbage out' concept" width=1200 height=630></figure><div class=post-content><p>AI shmAI. If you&rsquo;ve been paying any attention, you&rsquo;d know that for the vast majority of AI/ML projects, the real value comes from the data.</p><p>And if you&rsquo;re considering a role with a startup that has grand AI plans, you&rsquo;d better learn about the startup&rsquo;s work culture – which includes its data culture.</p><p>To help you, this post discusses the questions from the Culture section of <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. Let&rsquo;s jump right into it.</p><p><strong>Q1: How often are people expected to work outside normal business hours (founders included)? Is unreasonable overtime compensated?</strong> <em>&ldquo;It&rsquo;s a startup&rdquo;</em> isn&rsquo;t a valid excuse for constant overwork. Most successful startups require a sustained effort over many years, and <a href=https://longform.asmartbear.com/utilization/ target=_blank rel=noopener>working at capacity reduces productivity over time</a>. That said, high-effort spikes are inevitable – but unusual efforts should be recognised and compensated.</p><p><strong>Q2: Do people go on leave regularly (founders included)?</strong> This probes a similar cultural norm around overwork to Q1. Stay away from places where people never go on leave. It leads to burnout and collective stupidity: Knowledge workers <em>need</em> downtime to take a step back and come up with new creative ideas. Humans are not AIs.</p><p><strong>Q3: How do employees view the leadership team and founders?</strong> A small startup won&rsquo;t have significant quantitative data on employee views. Even at larger companies, employee surveys are often designed and administered in a way that masks problems. If you&rsquo;re considering a role with a startup, ask to speak with current employees to learn about their views on the culture, founders, and the company&rsquo;s prospects. If the company is established, sites like Glassdoor and Blind can help you probe issues beyond the current employee base. In any case, remember that there&rsquo;s an inherent selection bias in only questioning current staff, which is why it&rsquo;s worth <a href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/>learning about former employees and ex-founders</a>.</p><p><strong>Q4: How are wins celebrated? How are failures and mistakes analysed?</strong> The unfortunate reality is that many startup founders have little experience running or working at a startup. Therefore, they may not appreciate the need to celebrate wins or to take time to learn from failures. Rather than asking about general rituals, you could ask for examples: <em>What did you do after the last big release? What did you learn from the latest outage? How will you mitigate similar outages?</em></p><p><strong>Q5: How is excellent/poor individual performance evaluated and handled?</strong> If you&rsquo;ve worked anywhere at any capacity, you&rsquo;d know that the following is true: (1) underappreciated excellent employees may leave; and (2) poor performers may drag an entire team down. Before you join a growing startup, it pays to know that founders have put some thought into performance management – especially if excellence is one of your core values.</p><p><strong>Q6: Does the company run data-informed experiments (like A/B tests)? If so, what is considered a successful experiment? For example, what happens if a well-run experiment produces results that contradict the CEO&rsquo;s opinion?</strong> Finally, a question that directly addresses data culture! If you are considering a data role, a culture of intelligent experimentation is a positive sign that the startup is right for you. The correct definition of a <em>successful experiment</em> is <em>&ldquo;an experiment that taught us something new&rdquo;</em>. The common answer of <em>&ldquo;an experiment that confirmed our preconceived notions&rdquo;</em> (or in A/B testing terms: <em>&ldquo;an experiment where we shipped the test variation&rdquo;</em>) is absolutely wrong.</p><p><strong>Q7: Do leaders at the company explicitly seek truthful data, even when the truth may expose their mistakes?</strong> As with Q4, this may be best probed by asking leaders for examples of cases where they uncovered data that proved them wrong. Startups that harbour a culture of hiding from bad news are best avoided by excellent data people. In my experience and based on countless stories by friends, avoidance of bad news and truthful data becomes more common as companies grow. Great startup leaders care about the success of their business and know that hiding from the truth isn&rsquo;t going to make it disappear.</p><p><strong>Q8: How is uncertainty quantified and communicated? How does it affect decisions?</strong> Common sources of uncertainty include sampling biases and missing or wrong data. Marketers are especially notorious for ignoring uncertainty for the sake of memorability (<em><a href=https://tvtropes.org/pmwiki/pmwiki.php/Main/NineOutOfTenDoctorsAgree target=_blank rel=noopener>&ldquo;nine out of ten doctors agree&mldr;&rdquo;</a></em>). But ignoring uncertainty has long been a way of <a href=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/>getting data driven off a cliff</a>. This is at the core of why I recommend <a href=https://callingbullshit.org/ target=_blank rel=noopener>the Calling Bullshit book and course</a> to <a href=https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/>any aspiring data professional</a>. Don&rsquo;t work with startups that exhibit bullshit failure modes and ignore uncertainty – unless you have the mandate to shape the data culture for the better.</p><h2 id=data-to-ai-health-beyond-culture>Data-to-AI health beyond culture<a hidden class=anchor aria-hidden=true href=#data-to-ai-health-beyond-culture>#</a></h2><p>This post is part of a series on <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. Previous posts:</p><ul><li><a href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/>Assessing a startup&rsquo;s data-to-AI health: Overview and motivation</a></li><li><a href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/>Business questions to ask before taking a startup data role</a></li><li><a href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/>Probing the People aspects of an early-stage startup</a></li></ul><p><a href=https://yanirseroussi.com/data-to-ai-health-check/>You can download a guide containing all the questions as a PDF</a>. The next area I&rsquo;ll cover is Processes & Project Management – aspects of delivery that are more formal than the somewhat-intangible Culture. Feedback is always welcome!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Question startup culture before accepting a data-to-AI role on x" href="https://x.com/intent/tweet/?text=Question%20startup%20culture%20before%20accepting%20a%20data-to-AI%20role&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f20%2fquestion-startup-culture-before-accepting-a-data-to-ai-role%2f&amp;hashtags=business%2ccareer%2cdatastrategy%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Question startup culture before accepting a data-to-AI role on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f20%2fquestion-startup-culture-before-accepting-a-data-to-ai-role%2f&amp;title=Question%20startup%20culture%20before%20accepting%20a%20data-to-AI%20role&amp;summary=Question%20startup%20culture%20before%20accepting%20a%20data-to-AI%20role&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f20%2fquestion-startup-culture-before-accepting-a-data-to-ai-role%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Question startup culture before accepting a data-to-AI role on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f20%2fquestion-startup-culture-before-accepting-a-data-to-ai-role%2f&title=Question%20startup%20culture%20before%20accepting%20a%20data-to-AI%20role"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Question startup culture before accepting a data-to-AI role on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f20%2fquestion-startup-culture-before-accepting-a-data-to-ai-role%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Question startup culture before accepting a data-to-AI role on whatsapp" href="https://api.whatsapp.com/send?text=Question%20startup%20culture%20before%20accepting%20a%20data-to-AI%20role%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f20%2fquestion-startup-culture-before-accepting-a-data-to-ai-role%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Question startup culture before accepting a data-to-AI role on telegram" href="https://telegram.me/share/url?text=Question%20startup%20culture%20before%20accepting%20a%20data-to-AI%20role&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f20%2fquestion-startup-culture-before-accepting-a-data-to-ai-role%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Question startup culture before accepting a data-to-AI role on ycombinator" href="https://news.ycombinator.com/submitlink?t=Question%20startup%20culture%20before%20accepting%20a%20data-to-AI%20role&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f20%2fquestion-startup-culture-before-accepting-a-data-to-ai-role%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/index.html b/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/index.html
index 136ff31d6..269c0abb2 100644
--- a/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/index.html
+++ b/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Plumbing, Decisions, and Automation: De-hyping Data & AI | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,business,career,data engineering,data science,data strategy,startups"><meta name=description content="Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype)."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Plumbing, Decisions, and Automation: De-hyping Data & AI"><meta property="og:description" content="Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype)."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/"><meta property="og:image" content="https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-05-27T02:00:00+00:00"><meta property="article:modified_time" content="2024-05-27T12:25:30+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter.webp"><meta name=twitter:title content="Plumbing, Decisions, and Automation: De-hyping Data & AI"><meta name=twitter:description content="Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype)."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Plumbing, Decisions, and Automation: De-hyping Data \u0026 AI","item":"https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Plumbing, Decisions, and Automation: De-hyping Data \u0026 AI","name":"Plumbing, Decisions, and Automation: De-hyping Data \u0026 AI","description":"Three essential questions to understand where an organisation stands when it comes to Data \u0026amp; AI (with zero hype).","keywords":["artificial intelligence","business","career","data engineering","data science","data strategy","startups"],"articleBody":"Data \u0026 AI health is hard to define. Recently, it occurred to me that its essence can be distilled with three questions:\nPlumbing: What’s the state of your data engineering lifecycles? Decisions: How do you use descriptive, predictive, and causal modelling to support decisions? Automation: How do you use AI to automate processes? These questions help identify gaps and opportunities. While each question focuses on the present state, it’s natural to follow up with plans for a brighter future.\nIn practice, you would go deep on each area. Each question is a door that leads to a corridor with many more doors.\nAmateurs versus professionals If you’ve ever worked with data, you’d have a sense of what amateur and professional answers to the above questions may look like. In practice, answers are multifaceted and fall on a continuum. But here are some simplified examples from each end of the continuum:\nAmateur Professional Plumbing Rudimentary pipelines, manually-populated spreadsheets All necessary data is trustworthy and available on tap Decisions Relying on one-off charts and models, along with the intuition of HiPPOs (highest-paid persons’ opinions) Relying on relevant data and modelling efforts that are proportional to the gravity of each decision Automation Superficial use of off-the-shelf tools Deep, mindful integration of tech to replace manual work where it delivers the most value Going down the rabbit hole The three areas pretty much define my career, but there is always much more to learn. The main message of this post is that little has changed since Harrington Emerson uttered these words in 1911:\nAs to methods, there may be a million and then some, but principles are few. The person who grasps principles can successfully select their own methods. The person who tries methods, ignoring principles, is sure to have trouble.\n(OK, one thing did change – Emerson used man rather than person, but I fixed it for him.)\nYou can explore further with these posts:\nPlumbing: Fully understanding the data engineering lifecycle is more important than mastering a single tool. Decisions: According to my 2018 definition, this is what data science is all about. There’s endless depth to building descriptive, predictive, and causal models. But the key to rising above tool hype is understanding the why of data science, which is to support decisions. Automation: The term AI is around peak hype right now. This makes it easy for cynics to dismiss the over-excited claims of AI proponents. Avoid cynicism – simply think of AI as automation and understand that relentless but mindful automation is key to success in our world. More questions to probe the Data-to-AI health of startups This post is a slight detour from the series on my Data-to-AI Health Check for Startups. I figured it’s a valuable detour since I now see the triad of Plumbing, Decisions, and Automation as the essence of Data \u0026 AI health for any organisation.\nPrevious posts in the series:\nAssessing a startup’s data-to-AI health: Overview and motivation Business questions to ask before taking a startup data role Probing the People aspects of an early-stage startup Question startup culture before accepting a data-to-AI role You can download a guide containing all the questions as a PDF. I’m still planning to cover Processes \u0026 Project Management next – hopefully I won’t get detoured again. Feedback is always welcome!\n","wordCount":"553","inLanguage":"en","image":"https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter.webp","datePublished":"2024-05-27T02:00:00Z","dateModified":"2024-05-27T12:25:30+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Plumbing, Decisions, and Automation: De-hyping Data & AI</h1><div class=post-meta><span title='2024-05-27 02:00:00 +0000 UTC'>May 27, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter_huc3a2a5fb956faf388d9daac41939126e_69648_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter_huc3a2a5fb956faf388d9daac41939126e_69648_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter_huc3a2a5fb956faf388d9daac41939126e_69648_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter_huc3a2a5fb956faf388d9daac41939126e_69648_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter.webp alt="contrasting an amateur and a professional otter; the amateur asks about tools, the professional asks about plumbing, decisions, and automation" width=1200 height=630></figure><div class=post-content><p>Data & AI health is hard to define. Recently, it occurred to me that its essence can be distilled with three questions:</p><ol><li><strong>Plumbing:</strong> What&rsquo;s the state of your data engineering lifecycles?</li><li><strong>Decisions:</strong> How do you use descriptive, predictive, and causal modelling to support decisions?</li><li><strong>Automation:</strong> How do you use AI to automate processes?</li></ol><p>These questions help identify gaps and opportunities. While each question focuses on the present state, it&rsquo;s natural to follow up with plans for a brighter future.</p><p>In practice, you would go deep on each area. Each question is a door that leads to a corridor with many more doors.</p><h2 id=amateurs-versus-professionals>Amateurs versus professionals<a hidden class=anchor aria-hidden=true href=#amateurs-versus-professionals>#</a></h2><p>If you&rsquo;ve ever worked with data, you&rsquo;d have a sense of what amateur and professional answers to the above questions may look like. In practice, answers are multifaceted and fall on a continuum. But here are some simplified examples from each end of the continuum:</p><table><thead><tr><th></th><th>Amateur</th><th>Professional</th></tr></thead><tbody><tr><td><strong>Plumbing</strong></td><td>Rudimentary pipelines, manually-populated spreadsheets</td><td>All necessary data is trustworthy and available on tap</td></tr><tr><td><strong>Decisions</strong></td><td>Relying on one-off charts and models, along with the intuition of HiPPOs (highest-paid persons&rsquo; opinions)</td><td>Relying on relevant data and modelling efforts that are proportional to the gravity of each decision</td></tr><tr><td><strong>Automation</strong></td><td>Superficial use of off-the-shelf tools</td><td>Deep, mindful integration of tech to replace manual work where it delivers the most value</td></tr></tbody></table><h2 id=going-down-the-rabbit-hole>Going down the rabbit hole<a hidden class=anchor aria-hidden=true href=#going-down-the-rabbit-hole>#</a></h2><p>The three areas pretty much define my career, but there is always much more to learn. The main message of this post is that little has changed since <a href=https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/>Harrington Emerson uttered these words in 1911</a>:</p><blockquote><p>As to methods, there may be a million and then some, but principles are few. The person who grasps principles can successfully select their own methods. The person who tries methods, ignoring principles, is sure to have trouble.</p></blockquote><p><small>(OK, one thing did change – Emerson used <em>man</em> rather than <em>person</em>, but I fixed it for him.)</small></p><p>You can explore further with these posts:</p><ol><li><strong>Plumbing:</strong> Fully understanding <a href=https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/>the data engineering lifecycle</a> is more important than mastering a single tool.</li><li><strong>Decisions:</strong> According to <a href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/>my 2018 definition</a>, this is what data science is all about. There&rsquo;s endless depth to building descriptive, predictive, and causal models. But the key to rising above tool hype is understanding <a href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/>the <em>why</em> of data science</a>, which is to support decisions.</li><li><strong>Automation:</strong> The term <em>AI</em> is around peak hype right now. This makes it easy for cynics to dismiss the over-excited claims of AI proponents. Avoid cynicism – <a href=https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/>simply think of AI as automation</a> and <a href=https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/>understand that relentless but mindful automation is key to success in our world</a>.</li></ol><h2 id=more-questions-to-probe-the-data-to-ai-health-of-startups>More questions to probe the Data-to-AI health of startups<a hidden class=anchor aria-hidden=true href=#more-questions-to-probe-the-data-to-ai-health-of-startups>#</a></h2><p>This post is a slight detour from the series on <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. I figured it&rsquo;s a valuable detour since I now see the triad of Plumbing, Decisions, and Automation as the essence of Data & AI health for any organisation.</p><p>Previous posts in the series:</p><ul><li><a href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/>Assessing a startup&rsquo;s data-to-AI health: Overview and motivation</a></li><li><a href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/>Business questions to ask before taking a startup data role</a></li><li><a href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/>Probing the People aspects of an early-stage startup</a></li><li><a href=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/>Question startup culture before accepting a data-to-AI role</a></li></ul><p><a href=https://yanirseroussi.com/data-to-ai-health-check/>You can download a guide containing all the questions as a PDF</a>. I&rsquo;m still planning to cover Processes & Project Management next – hopefully I won&rsquo;t get detoured again. Feedback is always welcome!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-engineering/>Data Engineering</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Plumbing, Decisions, and Automation: De-hyping Data & AI on x" href="https://x.com/intent/tweet/?text=Plumbing%2c%20Decisions%2c%20and%20Automation%3a%20De-hyping%20Data%20%26%20AI&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f27%2fplumbing-decisions-and-automation-de-hyping-data-and-ai%2f&amp;hashtags=artificialintelligence%2cbusiness%2ccareer%2cdataengineering%2cdatascience%2cdatastrategy%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Plumbing, Decisions, and Automation: De-hyping Data & AI on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f27%2fplumbing-decisions-and-automation-de-hyping-data-and-ai%2f&amp;title=Plumbing%2c%20Decisions%2c%20and%20Automation%3a%20De-hyping%20Data%20%26%20AI&amp;summary=Plumbing%2c%20Decisions%2c%20and%20Automation%3a%20De-hyping%20Data%20%26%20AI&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f27%2fplumbing-decisions-and-automation-de-hyping-data-and-ai%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Plumbing, Decisions, and Automation: De-hyping Data & AI on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f27%2fplumbing-decisions-and-automation-de-hyping-data-and-ai%2f&title=Plumbing%2c%20Decisions%2c%20and%20Automation%3a%20De-hyping%20Data%20%26%20AI"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Plumbing, Decisions, and Automation: De-hyping Data & AI on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f27%2fplumbing-decisions-and-automation-de-hyping-data-and-ai%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Plumbing, Decisions, and Automation: De-hyping Data & AI on whatsapp" href="https://api.whatsapp.com/send?text=Plumbing%2c%20Decisions%2c%20and%20Automation%3a%20De-hyping%20Data%20%26%20AI%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f27%2fplumbing-decisions-and-automation-de-hyping-data-and-ai%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Plumbing, Decisions, and Automation: De-hyping Data & AI on telegram" href="https://telegram.me/share/url?text=Plumbing%2c%20Decisions%2c%20and%20Automation%3a%20De-hyping%20Data%20%26%20AI&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f27%2fplumbing-decisions-and-automation-de-hyping-data-and-ai%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Plumbing, Decisions, and Automation: De-hyping Data & AI on ycombinator" href="https://news.ycombinator.com/submitlink?t=Plumbing%2c%20Decisions%2c%20and%20Automation%3a%20De-hyping%20Data%20%26%20AI&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f27%2fplumbing-decisions-and-automation-de-hyping-data-and-ai%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="artificial intelligence,business,career,data engineering,data science,data strategy,startups"><meta name=description content="Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype)."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Plumbing, Decisions, and Automation: De-hyping Data & AI"><meta property="og:description" content="Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype)."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/"><meta property="og:image" content="https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-05-27T02:00:00+00:00"><meta property="article:modified_time" content="2024-05-27T12:25:30+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter.webp"><meta name=twitter:title content="Plumbing, Decisions, and Automation: De-hyping Data & AI"><meta name=twitter:description content="Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype)."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Plumbing, Decisions, and Automation: De-hyping Data \u0026 AI","item":"https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Plumbing, Decisions, and Automation: De-hyping Data \u0026 AI","name":"Plumbing, Decisions, and Automation: De-hyping Data \u0026 AI","description":"Three essential questions to understand where an organisation stands when it comes to Data \u0026amp; AI (with zero hype).","keywords":["artificial intelligence","business","career","data engineering","data science","data strategy","startups"],"articleBody":"Data \u0026 AI health is hard to define. Recently, it occurred to me that its essence can be distilled with three questions:\nPlumbing: What’s the state of your data engineering lifecycles? Decisions: How do you use descriptive, predictive, and causal modelling to support decisions? Automation: How do you use AI to automate processes? These questions help identify gaps and opportunities. While each question focuses on the present state, it’s natural to follow up with plans for a brighter future.\nIn practice, you would go deep on each area. Each question is a door that leads to a corridor with many more doors.\nAmateurs versus professionals If you’ve ever worked with data, you’d have a sense of what amateur and professional answers to the above questions may look like. In practice, answers are multifaceted and fall on a continuum. But here are some simplified examples from each end of the continuum:\nAmateur Professional Plumbing Rudimentary pipelines, manually-populated spreadsheets All necessary data is trustworthy and available on tap Decisions Relying on one-off charts and models, along with the intuition of HiPPOs (highest-paid persons’ opinions) Relying on relevant data and modelling efforts that are proportional to the gravity of each decision Automation Superficial use of off-the-shelf tools Deep, mindful integration of tech to replace manual work where it delivers the most value Going down the rabbit hole The three areas pretty much define my career, but there is always much more to learn. The main message of this post is that little has changed since Harrington Emerson uttered these words in 1911:\nAs to methods, there may be a million and then some, but principles are few. The person who grasps principles can successfully select their own methods. The person who tries methods, ignoring principles, is sure to have trouble.\n(OK, one thing did change – Emerson used man rather than person, but I fixed it for him.)\nYou can explore further with these posts:\nPlumbing: Fully understanding the data engineering lifecycle is more important than mastering a single tool. Decisions: According to my 2018 definition, this is what data science is all about. There’s endless depth to building descriptive, predictive, and causal models. But the key to rising above tool hype is understanding the why of data science, which is to support decisions. Automation: The term AI is around peak hype right now. This makes it easy for cynics to dismiss the over-excited claims of AI proponents. Avoid cynicism – simply think of AI as automation and understand that relentless but mindful automation is key to success in our world. More questions to probe the Data-to-AI health of startups This post is a slight detour from the series on my Data-to-AI Health Check for Startups. I figured it’s a valuable detour since I now see the triad of Plumbing, Decisions, and Automation as the essence of Data \u0026 AI health for any organisation.\nPrevious posts in the series:\nAssessing a startup’s data-to-AI health: Overview and motivation Business questions to ask before taking a startup data role Probing the People aspects of an early-stage startup Question startup culture before accepting a data-to-AI role You can download a guide containing all the questions as a PDF. I’m still planning to cover Processes \u0026 Project Management next – hopefully I won’t get detoured again. Feedback is always welcome!\n","wordCount":"553","inLanguage":"en","image":"https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter.webp","datePublished":"2024-05-27T02:00:00Z","dateModified":"2024-05-27T12:25:30+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Plumbing, Decisions, and Automation: De-hyping Data & AI</h1><div class=post-meta><span title='2024-05-27 02:00:00 +0000 UTC'>May 27, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter_huc3a2a5fb956faf388d9daac41939126e_69648_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter_huc3a2a5fb956faf388d9daac41939126e_69648_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter_huc3a2a5fb956faf388d9daac41939126e_69648_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter_huc3a2a5fb956faf388d9daac41939126e_69648_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/amateur-versus-professional-data-and-ai-otter.webp alt="contrasting an amateur and a professional otter; the amateur asks about tools, the professional asks about plumbing, decisions, and automation" width=1200 height=630></figure><div class=post-content><p>Data & AI health is hard to define. Recently, it occurred to me that its essence can be distilled with three questions:</p><ol><li><strong>Plumbing:</strong> What&rsquo;s the state of your data engineering lifecycles?</li><li><strong>Decisions:</strong> How do you use descriptive, predictive, and causal modelling to support decisions?</li><li><strong>Automation:</strong> How do you use AI to automate processes?</li></ol><p>These questions help identify gaps and opportunities. While each question focuses on the present state, it&rsquo;s natural to follow up with plans for a brighter future.</p><p>In practice, you would go deep on each area. Each question is a door that leads to a corridor with many more doors.</p><h2 id=amateurs-versus-professionals>Amateurs versus professionals<a hidden class=anchor aria-hidden=true href=#amateurs-versus-professionals>#</a></h2><p>If you&rsquo;ve ever worked with data, you&rsquo;d have a sense of what amateur and professional answers to the above questions may look like. In practice, answers are multifaceted and fall on a continuum. But here are some simplified examples from each end of the continuum:</p><table><thead><tr><th></th><th>Amateur</th><th>Professional</th></tr></thead><tbody><tr><td><strong>Plumbing</strong></td><td>Rudimentary pipelines, manually-populated spreadsheets</td><td>All necessary data is trustworthy and available on tap</td></tr><tr><td><strong>Decisions</strong></td><td>Relying on one-off charts and models, along with the intuition of HiPPOs (highest-paid persons&rsquo; opinions)</td><td>Relying on relevant data and modelling efforts that are proportional to the gravity of each decision</td></tr><tr><td><strong>Automation</strong></td><td>Superficial use of off-the-shelf tools</td><td>Deep, mindful integration of tech to replace manual work where it delivers the most value</td></tr></tbody></table><h2 id=going-down-the-rabbit-hole>Going down the rabbit hole<a hidden class=anchor aria-hidden=true href=#going-down-the-rabbit-hole>#</a></h2><p>The three areas pretty much define my career, but there is always much more to learn. The main message of this post is that little has changed since <a href=https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/>Harrington Emerson uttered these words in 1911</a>:</p><blockquote><p>As to methods, there may be a million and then some, but principles are few. The person who grasps principles can successfully select their own methods. The person who tries methods, ignoring principles, is sure to have trouble.</p></blockquote><p><small>(OK, one thing did change – Emerson used <em>man</em> rather than <em>person</em>, but I fixed it for him.)</small></p><p>You can explore further with these posts:</p><ol><li><strong>Plumbing:</strong> Fully understanding <a href=https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/>the data engineering lifecycle</a> is more important than mastering a single tool.</li><li><strong>Decisions:</strong> According to <a href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/>my 2018 definition</a>, this is what data science is all about. There&rsquo;s endless depth to building descriptive, predictive, and causal models. But the key to rising above tool hype is understanding <a href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/>the <em>why</em> of data science</a>, which is to support decisions.</li><li><strong>Automation:</strong> The term <em>AI</em> is around peak hype right now. This makes it easy for cynics to dismiss the over-excited claims of AI proponents. Avoid cynicism – <a href=https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/>simply think of AI as automation</a> and <a href=https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/>understand that relentless but mindful automation is key to success in our world</a>.</li></ol><h2 id=more-questions-to-probe-the-data-to-ai-health-of-startups>More questions to probe the Data-to-AI health of startups<a hidden class=anchor aria-hidden=true href=#more-questions-to-probe-the-data-to-ai-health-of-startups>#</a></h2><p>This post is a slight detour from the series on <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. I figured it&rsquo;s a valuable detour since I now see the triad of Plumbing, Decisions, and Automation as the essence of Data & AI health for any organisation.</p><p>Previous posts in the series:</p><ul><li><a href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/>Assessing a startup&rsquo;s data-to-AI health: Overview and motivation</a></li><li><a href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/>Business questions to ask before taking a startup data role</a></li><li><a href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/>Probing the People aspects of an early-stage startup</a></li><li><a href=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/>Question startup culture before accepting a data-to-AI role</a></li></ul><p><a href=https://yanirseroussi.com/data-to-ai-health-check/>You can download a guide containing all the questions as a PDF</a>. I&rsquo;m still planning to cover Processes & Project Management next – hopefully I won&rsquo;t get detoured again. Feedback is always welcome!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-engineering/>Data Engineering</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Plumbing, Decisions, and Automation: De-hyping Data & AI on x" href="https://x.com/intent/tweet/?text=Plumbing%2c%20Decisions%2c%20and%20Automation%3a%20De-hyping%20Data%20%26%20AI&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f27%2fplumbing-decisions-and-automation-de-hyping-data-and-ai%2f&amp;hashtags=artificialintelligence%2cbusiness%2ccareer%2cdataengineering%2cdatascience%2cdatastrategy%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Plumbing, Decisions, and Automation: De-hyping Data & AI on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f27%2fplumbing-decisions-and-automation-de-hyping-data-and-ai%2f&amp;title=Plumbing%2c%20Decisions%2c%20and%20Automation%3a%20De-hyping%20Data%20%26%20AI&amp;summary=Plumbing%2c%20Decisions%2c%20and%20Automation%3a%20De-hyping%20Data%20%26%20AI&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f27%2fplumbing-decisions-and-automation-de-hyping-data-and-ai%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Plumbing, Decisions, and Automation: De-hyping Data & AI on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f27%2fplumbing-decisions-and-automation-de-hyping-data-and-ai%2f&title=Plumbing%2c%20Decisions%2c%20and%20Automation%3a%20De-hyping%20Data%20%26%20AI"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Plumbing, Decisions, and Automation: De-hyping Data & AI on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f27%2fplumbing-decisions-and-automation-de-hyping-data-and-ai%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Plumbing, Decisions, and Automation: De-hyping Data & AI on whatsapp" href="https://api.whatsapp.com/send?text=Plumbing%2c%20Decisions%2c%20and%20Automation%3a%20De-hyping%20Data%20%26%20AI%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f27%2fplumbing-decisions-and-automation-de-hyping-data-and-ai%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Plumbing, Decisions, and Automation: De-hyping Data & AI on telegram" href="https://telegram.me/share/url?text=Plumbing%2c%20Decisions%2c%20and%20Automation%3a%20De-hyping%20Data%20%26%20AI&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f27%2fplumbing-decisions-and-automation-de-hyping-data-and-ai%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Plumbing, Decisions, and Automation: De-hyping Data & AI on ycombinator" href="https://news.ycombinator.com/submitlink?t=Plumbing%2c%20Decisions%2c%20and%20Automation%3a%20De-hyping%20Data%20%26%20AI&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f05%2f27%2fplumbing-decisions-and-automation-de-hyping-data-and-ai%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2024/06/03/how-to-avoid-startups-with-poor-development-processes/index.html b/2024/06/03/how-to-avoid-startups-with-poor-development-processes/index.html
index abe7b0a7b..67fbadfc1 100644
--- a/2024/06/03/how-to-avoid-startups-with-poor-development-processes/index.html
+++ b/2024/06/03/how-to-avoid-startups-with-poor-development-processes/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>How to avoid startups with poor development processes | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,career,data strategy,software engineering,startups"><meta name=description content="Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="How to avoid startups with poor development processes"><meta property="og:description" content="Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/"><meta property="og:image" content="https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-06-03T02:45:00+00:00"><meta property="article:modified_time" content="2024-06-03T12:58:00+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often.webp"><meta name=twitter:title content="How to avoid startups with poor development processes"><meta name=twitter:description content="Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"How to avoid startups with poor development processes","item":"https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"How to avoid startups with poor development processes","name":"How to avoid startups with poor development processes","description":"Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.","keywords":["business","career","data strategy","software engineering","startups"],"articleBody":"Many founders have never worked at a startup. This may make them oblivious to failure modes that arise from poor development processes. With poor processes, even the most brilliant people are ineffective, as they’re constantly wasting time and fighting fires.\nYou don’t want to join a startup that’s mired in intractable chaos. To avoid such a place, ask questions from the Processes \u0026 Project Management section of my Data-to-AI Health Check for Startups. Do this even if you’re a data scientist or a junior engineer – because everyone suffers in an environment with poor processes.\nThis post briefly explains each question, along with expected answers and suggestions for eliciting informative responses.\nThe questions Q1: How often are changes shipped to production? Even in 2024, there are software startups that don’t follow the RERO philosophy: “Release early. Release often. And listen to your customers.” Sporadic releases are often indicative of poor processes. Stay away from such startups, unless you have a mandate to improve things.\nQ2: How is the impact of new features quantified before and after their release? Another classic quote in the spirit of RERO is the first principle behind the Agile Manifesto: “Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.” And you can’t know what’s valuable software if you don’t collect data from your users and meaningfully aggregate it. Helping with this is likely to be your responsibility if you’re the first data hire – ensure that the founders understand that.\nQ3: What processes and systems are in place to collect qualitative and quantitative feedback from users (both internal and external)? This takes a more holistic view of user feedback and includes internal users. A common failure mode for internal-facing data teams is to spend too much time satisfying low-value requests, e.g., build a dashboard that is barely used. Given the opportunity cost, it’s important to collect truthful feedback from all users.\nQ4: Were there any outages in recent months? How did they affect users? What changes did you implement to reduce the chance of outages and impact on users? Some outages are unavoidable, especially early on. If the answers reveal that there are repeated issues that go unaddressed, the team may be too busy fighting fires rather than shipping valuable software. This applies to internal systems as well, e.g., data ingestion pipelines that keep breaking shouldn’t be seen as normal.\nQ5: What system do you use for prioritising and tracking work across the company? This includes project management tools, and their use in practice – but any tool can be misused. The important things to look for are that: (1) a system exists; and (2) the system will improve over time.\nQ6: How do you balance paying down tech debt and shipping new features? Tech debt is unavoidable, especially in a fast-growing startup. You’re looking for an acknowledgement of this fact, along with an understanding that accruing too much tech debt reduces the ability to ship new features. While the concept comes from software engineering, it also applies to product analytics: if the data is a tangled mess and pipelines are unreliable (i.e., data tech debt is high), then analytics can’t be trusted to help improve the product.\nQ7: What proportion of the engineering and data staff time is spent on: (1) dealing with bugs and incidents; (2) meetings and admin overheads; and (3) shipping new features? This question takes Q5 \u0026 Q6 from philosophy to practice – “don’t tell me your priorities; show me your calendar”. If individual contributors don’t spend most of their time shipping new features (including research into the necessity of these features), there may be too much tech debt or too many overheads.\nQ8: What are the key team rituals (e.g., recurring meetings, sprint planning, demos, postmortems, standups)? Again, the calendar is a source of insights. For early-stage startups, awareness of the need for rituals may be low, and they may be implemented poorly. For example, I believe that daily synchronous standups are a bad idea for remote teams. A better approach is using a bot that asks everyone for their progress, upcoming tasks, and blockers – and following up to ensure that everyone is on track as a team.\nQ9: What processes are in place for code, design, and architecture reviews? The depth and time of each process should be proportional to the magnitude and impact of the change. For example, a trivial code improvement in a product that barely has any users can be shipped with a post-commit review. In contrast, migrating a live product to a different database system requires deeper consultation.\nQ10: Are there different processes for internal-facing data products (e.g., custom admin dashboards)? The unfortunate reality of many organisations is that data analytics and software engineering live in different silos. This can easily happen at small startups as well, with a single analyst working in isolation from product teams. I believe that the ideal situation is that internal-facing data products are treated like the software products that they are. They may not need to be as pretty as external-facing products, but their development should follow processes that ensure high quality, trustworthiness, and satisfaction of user needs.\nQ11: Give an example of how the above items manifest in a big project that was recently completed. It’s often hard to speak of abstract processes. An example of a big project may be the best way of moving from the abstract to the everyday reality of the startup.\nQ12: Are there any gaps in the current processes or changes you’d like to introduce? This probes for a growth mindset. If the answer is no, there’s probably something wrong.\nWhat if you can’t fit in all the questions? If you’re simply trying to avoid a dysfunctional startup rather than dive deep into development processes, a subset of the questions will suffice:\nQ1 + Q2: Ensure that products are continuously improving based on user feedback. Q7: Ensure that time spent by current staff aligns with how you want to spend your time. Q11 + Q12: Ensure that leaders are conscious of the need to implement and improve processes. Data-to-AI health beyond processes This post is part of a series on my Data-to-AI Health Check for Startups. Previous posts:\nAssessing a startup’s data-to-AI health: Overview and motivation Business questions to ask before taking a startup data role Probing the People aspects of an early-stage startup Question startup culture before accepting a data-to-AI role Plumbing, Decisions, and Automation: De-hyping Data \u0026 AI You can download a guide containing all the questions as a PDF. The next area of the health check is Data (finally!). Feedback is always welcome!\n","wordCount":"1112","inLanguage":"en","image":"https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often.webp","datePublished":"2024-06-03T02:45:00Z","dateModified":"2024-06-03T12:58:00+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">How to avoid startups with poor development processes</h1><div class=post-meta><span title='2024-06-03 02:45:00 +0000 UTC'>June 3, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often_hu969dbd1a564f6a6647c1410ed2caeb5c_8340_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often_hu969dbd1a564f6a6647c1410ed2caeb5c_8340_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often_hu969dbd1a564f6a6647c1410ed2caeb5c_8340_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often_hu969dbd1a564f6a6647c1410ed2caeb5c_8340_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often.webp alt="minimalist image of the phrase RERO: release early, release often" width=1200 height=630><p>Avoid those who release late, release seldom, and don&rsquo;t listen to their customers.</p></figure><div class=post-content><p>Many founders have never worked at a startup. This may make them oblivious to failure modes that arise from poor development processes. With poor processes, even the most brilliant people are ineffective, as they&rsquo;re constantly wasting time and fighting fires.</p><p>You don&rsquo;t want to join a startup that&rsquo;s mired in intractable chaos. To avoid such a place, ask questions from the Processes & Project Management section of <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. Do this even if you&rsquo;re a data scientist or a junior engineer – because everyone suffers in an environment with poor processes.</p><p>This post briefly explains each question, along with expected answers and suggestions for eliciting informative responses.</p><h2 id=the-questions>The questions<a hidden class=anchor aria-hidden=true href=#the-questions>#</a></h2><p><strong>Q1: How often are changes shipped to production?</strong> Even in 2024, there are software startups that don&rsquo;t follow the <a href=https://en.wikipedia.org/wiki/Release_early,_release_often target=_blank rel=noopener>RERO philosophy</a>: <em>&ldquo;Release early. Release often. And listen to your customers.&rdquo;</em> Sporadic releases are often indicative of poor processes. Stay away from such startups, unless you have a mandate to improve things.</p><p><strong>Q2: How is the impact of new features quantified before and after their release?</strong> Another classic quote in the spirit of RERO is <a href=https://agilemanifesto.org/principles.html target=_blank rel=noopener>the first principle behind the Agile Manifesto</a>: <em>&ldquo;Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.&rdquo;</em> And you can&rsquo;t know what&rsquo;s <em>valuable software</em> if you don&rsquo;t collect data from your users and meaningfully aggregate it. Helping with this is likely to be your responsibility if you&rsquo;re <a href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/>the first data hire</a> – ensure that the founders understand that.</p><p><strong>Q3: What processes and systems are in place to collect qualitative and quantitative feedback from users (both internal and external)?</strong> This takes a more holistic view of user feedback and includes internal users. A common failure mode for internal-facing data teams is to spend too much time satisfying low-value requests, e.g., build a dashboard that is barely used. Given the opportunity cost, it&rsquo;s important to collect truthful feedback from <em>all</em> users.</p><p><strong>Q4: Were there any outages in recent months? How did they affect users? What changes did you implement to reduce the chance of outages and impact on users?</strong> Some outages are unavoidable, especially early on. If the answers reveal that there are repeated issues that go unaddressed, the team may be too busy fighting fires rather than shipping valuable software. This applies to internal systems as well, e.g., data ingestion pipelines that keep breaking shouldn&rsquo;t be seen as normal.</p><p><strong>Q5: What system do you use for prioritising and tracking work across the company?</strong> This includes project management tools, and their use in practice – but any tool can be misused. The important things to look for are that: (1) a system exists; and (2) the system will improve over time.</p><p><strong>Q6: How do you balance paying down tech debt and shipping new features?</strong> <a href=https://blog.codinghorror.com/paying-down-your-technical-debt/ target=_blank rel=noopener>Tech debt is unavoidable</a>, especially in a fast-growing startup. You&rsquo;re looking for an acknowledgement of this fact, along with an understanding that accruing too much tech debt reduces the ability to ship new features. While the concept comes from software engineering, it also applies to product analytics: if the data is a tangled mess and pipelines are unreliable (i.e., <em>data</em> tech debt is high), then analytics can&rsquo;t be trusted to help improve the product.</p><p><strong>Q7: What proportion of the engineering and data staff time is spent on: (1) dealing with bugs and incidents; (2) meetings and admin overheads; and (3) shipping new features?</strong> This question takes Q5 & Q6 from philosophy to practice – <a href=https://www.instagram.com/p/CrdiX-4OBlv/ target=_blank rel=noopener><em>&ldquo;don&rsquo;t tell me your priorities; show me your calendar&rdquo;</em></a>. If individual contributors don&rsquo;t spend most of their time shipping new features (including research into the necessity of these features), there may be too much tech debt or too many overheads.</p><p><strong>Q8: What are the key team rituals (e.g., recurring meetings, sprint planning, demos, postmortems, standups)?</strong> Again, the calendar is a source of insights. For early-stage startups, awareness of the need for rituals may be low, and they may be implemented poorly. For example, I believe that daily synchronous standups are a bad idea for remote teams. A better approach is using a bot that asks everyone for their progress, upcoming tasks, and blockers – <em>and following up to ensure that everyone is on track as a team</em>.</p><p><strong>Q9: What processes are in place for code, design, and architecture reviews?</strong> The depth and time of each process should be proportional to the magnitude and impact of the change. For example, a trivial code improvement in a product that barely has any users can be shipped with a post-commit review. In contrast, migrating a live product to a different database system requires deeper consultation.</p><p><strong>Q10: Are there different processes for internal-facing data products (e.g., custom admin dashboards)?</strong> The unfortunate reality of many organisations is that data analytics and software engineering live in different silos. This can easily happen at small startups as well, with a single analyst working in isolation from product teams. I believe that the ideal situation is that internal-facing data products are treated like the software products that they are. They may not need to be as pretty as external-facing products, but their development should follow processes that ensure high quality, trustworthiness, and satisfaction of user needs.</p><p><strong>Q11: Give an example of how the above items manifest in a big project that was recently completed.</strong> It&rsquo;s often hard to speak of abstract processes. An example of a big project may be the best way of moving from the abstract to the everyday reality of the startup.</p><p><strong>Q12: Are there any gaps in the current processes or changes you&rsquo;d like to introduce?</strong> This probes for a growth mindset. If the answer is <em>no</em>, there&rsquo;s probably something wrong.</p><h2 id=what-if-you-cant-fit-in-all-the-questions>What if you can&rsquo;t fit in all the questions?<a hidden class=anchor aria-hidden=true href=#what-if-you-cant-fit-in-all-the-questions>#</a></h2><p>If you&rsquo;re simply trying to avoid a dysfunctional startup rather than dive deep into development processes, a subset of the questions will suffice:</p><ul><li>Q1 + Q2: Ensure that products are continuously improving based on user feedback.</li><li>Q7: Ensure that time spent by current staff aligns with how you want to spend your time.</li><li>Q11 + Q12: Ensure that leaders are conscious of the need to implement and improve processes.</li></ul><h2 id=data-to-ai-health-beyond-processes>Data-to-AI health beyond processes<a hidden class=anchor aria-hidden=true href=#data-to-ai-health-beyond-processes>#</a></h2><p>This post is part of a series on <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. Previous posts:</p><ul><li><a href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/>Assessing a startup&rsquo;s data-to-AI health: Overview and motivation</a></li><li><a href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/>Business questions to ask before taking a startup data role</a></li><li><a href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/>Probing the People aspects of an early-stage startup</a></li><li><a href=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/>Question startup culture before accepting a data-to-AI role</a></li><li><a href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/>Plumbing, Decisions, and Automation: De-hyping Data & AI</a></li></ul><p><a href=https://yanirseroussi.com/data-to-ai-health-check/>You can download a guide containing all the questions as a PDF</a>. The next area of the health check is Data (<em>finally!</em>). Feedback is always welcome!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share How to avoid startups with poor development processes on x" href="https://x.com/intent/tweet/?text=How%20to%20avoid%20startups%20with%20poor%20development%20processes&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f03%2fhow-to-avoid-startups-with-poor-development-processes%2f&amp;hashtags=business%2ccareer%2cdatastrategy%2csoftwareengineering%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to avoid startups with poor development processes on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f03%2fhow-to-avoid-startups-with-poor-development-processes%2f&amp;title=How%20to%20avoid%20startups%20with%20poor%20development%20processes&amp;summary=How%20to%20avoid%20startups%20with%20poor%20development%20processes&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f03%2fhow-to-avoid-startups-with-poor-development-processes%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to avoid startups with poor development processes on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f03%2fhow-to-avoid-startups-with-poor-development-processes%2f&title=How%20to%20avoid%20startups%20with%20poor%20development%20processes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to avoid startups with poor development processes on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f03%2fhow-to-avoid-startups-with-poor-development-processes%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to avoid startups with poor development processes on whatsapp" href="https://api.whatsapp.com/send?text=How%20to%20avoid%20startups%20with%20poor%20development%20processes%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f03%2fhow-to-avoid-startups-with-poor-development-processes%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to avoid startups with poor development processes on telegram" href="https://telegram.me/share/url?text=How%20to%20avoid%20startups%20with%20poor%20development%20processes&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f03%2fhow-to-avoid-startups-with-poor-development-processes%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to avoid startups with poor development processes on ycombinator" href="https://news.ycombinator.com/submitlink?t=How%20to%20avoid%20startups%20with%20poor%20development%20processes&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f03%2fhow-to-avoid-startups-with-poor-development-processes%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="business,career,data strategy,software engineering,startups"><meta name=description content="Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="How to avoid startups with poor development processes"><meta property="og:description" content="Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/"><meta property="og:image" content="https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-06-03T02:45:00+00:00"><meta property="article:modified_time" content="2024-06-03T12:58:00+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often.webp"><meta name=twitter:title content="How to avoid startups with poor development processes"><meta name=twitter:description content="Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"How to avoid startups with poor development processes","item":"https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"How to avoid startups with poor development processes","name":"How to avoid startups with poor development processes","description":"Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.","keywords":["business","career","data strategy","software engineering","startups"],"articleBody":"Many founders have never worked at a startup. This may make them oblivious to failure modes that arise from poor development processes. With poor processes, even the most brilliant people are ineffective, as they’re constantly wasting time and fighting fires.\nYou don’t want to join a startup that’s mired in intractable chaos. To avoid such a place, ask questions from the Processes \u0026 Project Management section of my Data-to-AI Health Check for Startups. Do this even if you’re a data scientist or a junior engineer – because everyone suffers in an environment with poor processes.\nThis post briefly explains each question, along with expected answers and suggestions for eliciting informative responses.\nThe questions Q1: How often are changes shipped to production? Even in 2024, there are software startups that don’t follow the RERO philosophy: “Release early. Release often. And listen to your customers.” Sporadic releases are often indicative of poor processes. Stay away from such startups, unless you have a mandate to improve things.\nQ2: How is the impact of new features quantified before and after their release? Another classic quote in the spirit of RERO is the first principle behind the Agile Manifesto: “Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.” And you can’t know what’s valuable software if you don’t collect data from your users and meaningfully aggregate it. Helping with this is likely to be your responsibility if you’re the first data hire – ensure that the founders understand that.\nQ3: What processes and systems are in place to collect qualitative and quantitative feedback from users (both internal and external)? This takes a more holistic view of user feedback and includes internal users. A common failure mode for internal-facing data teams is to spend too much time satisfying low-value requests, e.g., build a dashboard that is barely used. Given the opportunity cost, it’s important to collect truthful feedback from all users.\nQ4: Were there any outages in recent months? How did they affect users? What changes did you implement to reduce the chance of outages and impact on users? Some outages are unavoidable, especially early on. If the answers reveal that there are repeated issues that go unaddressed, the team may be too busy fighting fires rather than shipping valuable software. This applies to internal systems as well, e.g., data ingestion pipelines that keep breaking shouldn’t be seen as normal.\nQ5: What system do you use for prioritising and tracking work across the company? This includes project management tools, and their use in practice – but any tool can be misused. The important things to look for are that: (1) a system exists; and (2) the system will improve over time.\nQ6: How do you balance paying down tech debt and shipping new features? Tech debt is unavoidable, especially in a fast-growing startup. You’re looking for an acknowledgement of this fact, along with an understanding that accruing too much tech debt reduces the ability to ship new features. While the concept comes from software engineering, it also applies to product analytics: if the data is a tangled mess and pipelines are unreliable (i.e., data tech debt is high), then analytics can’t be trusted to help improve the product.\nQ7: What proportion of the engineering and data staff time is spent on: (1) dealing with bugs and incidents; (2) meetings and admin overheads; and (3) shipping new features? This question takes Q5 \u0026 Q6 from philosophy to practice – “don’t tell me your priorities; show me your calendar”. If individual contributors don’t spend most of their time shipping new features (including research into the necessity of these features), there may be too much tech debt or too many overheads.\nQ8: What are the key team rituals (e.g., recurring meetings, sprint planning, demos, postmortems, standups)? Again, the calendar is a source of insights. For early-stage startups, awareness of the need for rituals may be low, and they may be implemented poorly. For example, I believe that daily synchronous standups are a bad idea for remote teams. A better approach is using a bot that asks everyone for their progress, upcoming tasks, and blockers – and following up to ensure that everyone is on track as a team.\nQ9: What processes are in place for code, design, and architecture reviews? The depth and time of each process should be proportional to the magnitude and impact of the change. For example, a trivial code improvement in a product that barely has any users can be shipped with a post-commit review. In contrast, migrating a live product to a different database system requires deeper consultation.\nQ10: Are there different processes for internal-facing data products (e.g., custom admin dashboards)? The unfortunate reality of many organisations is that data analytics and software engineering live in different silos. This can easily happen at small startups as well, with a single analyst working in isolation from product teams. I believe that the ideal situation is that internal-facing data products are treated like the software products that they are. They may not need to be as pretty as external-facing products, but their development should follow processes that ensure high quality, trustworthiness, and satisfaction of user needs.\nQ11: Give an example of how the above items manifest in a big project that was recently completed. It’s often hard to speak of abstract processes. An example of a big project may be the best way of moving from the abstract to the everyday reality of the startup.\nQ12: Are there any gaps in the current processes or changes you’d like to introduce? This probes for a growth mindset. If the answer is no, there’s probably something wrong.\nWhat if you can’t fit in all the questions? If you’re simply trying to avoid a dysfunctional startup rather than dive deep into development processes, a subset of the questions will suffice:\nQ1 + Q2: Ensure that products are continuously improving based on user feedback. Q7: Ensure that time spent by current staff aligns with how you want to spend your time. Q11 + Q12: Ensure that leaders are conscious of the need to implement and improve processes. Data-to-AI health beyond processes This post is part of a series on my Data-to-AI Health Check for Startups. Previous posts:\nAssessing a startup’s data-to-AI health: Overview and motivation Business questions to ask before taking a startup data role Probing the People aspects of an early-stage startup Question startup culture before accepting a data-to-AI role Plumbing, Decisions, and Automation: De-hyping Data \u0026 AI You can download a guide containing all the questions as a PDF. The next area of the health check is Data (finally!). Feedback is always welcome!\n","wordCount":"1112","inLanguage":"en","image":"https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often.webp","datePublished":"2024-06-03T02:45:00Z","dateModified":"2024-06-03T12:58:00+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">How to avoid startups with poor development processes</h1><div class=post-meta><span title='2024-06-03 02:45:00 +0000 UTC'>June 3, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often_hu969dbd1a564f6a6647c1410ed2caeb5c_8340_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often_hu969dbd1a564f6a6647c1410ed2caeb5c_8340_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often_hu969dbd1a564f6a6647c1410ed2caeb5c_8340_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often_hu969dbd1a564f6a6647c1410ed2caeb5c_8340_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/release-early-release-often.webp alt="minimalist image of the phrase RERO: release early, release often" width=1200 height=630><p>Avoid those who release late, release seldom, and don&rsquo;t listen to their customers.</p></figure><div class=post-content><p>Many founders have never worked at a startup. This may make them oblivious to failure modes that arise from poor development processes. With poor processes, even the most brilliant people are ineffective, as they&rsquo;re constantly wasting time and fighting fires.</p><p>You don&rsquo;t want to join a startup that&rsquo;s mired in intractable chaos. To avoid such a place, ask questions from the Processes & Project Management section of <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. Do this even if you&rsquo;re a data scientist or a junior engineer – because everyone suffers in an environment with poor processes.</p><p>This post briefly explains each question, along with expected answers and suggestions for eliciting informative responses.</p><h2 id=the-questions>The questions<a hidden class=anchor aria-hidden=true href=#the-questions>#</a></h2><p><strong>Q1: How often are changes shipped to production?</strong> Even in 2024, there are software startups that don&rsquo;t follow the <a href=https://en.wikipedia.org/wiki/Release_early,_release_often target=_blank rel=noopener>RERO philosophy</a>: <em>&ldquo;Release early. Release often. And listen to your customers.&rdquo;</em> Sporadic releases are often indicative of poor processes. Stay away from such startups, unless you have a mandate to improve things.</p><p><strong>Q2: How is the impact of new features quantified before and after their release?</strong> Another classic quote in the spirit of RERO is <a href=https://agilemanifesto.org/principles.html target=_blank rel=noopener>the first principle behind the Agile Manifesto</a>: <em>&ldquo;Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.&rdquo;</em> And you can&rsquo;t know what&rsquo;s <em>valuable software</em> if you don&rsquo;t collect data from your users and meaningfully aggregate it. Helping with this is likely to be your responsibility if you&rsquo;re <a href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/>the first data hire</a> – ensure that the founders understand that.</p><p><strong>Q3: What processes and systems are in place to collect qualitative and quantitative feedback from users (both internal and external)?</strong> This takes a more holistic view of user feedback and includes internal users. A common failure mode for internal-facing data teams is to spend too much time satisfying low-value requests, e.g., build a dashboard that is barely used. Given the opportunity cost, it&rsquo;s important to collect truthful feedback from <em>all</em> users.</p><p><strong>Q4: Were there any outages in recent months? How did they affect users? What changes did you implement to reduce the chance of outages and impact on users?</strong> Some outages are unavoidable, especially early on. If the answers reveal that there are repeated issues that go unaddressed, the team may be too busy fighting fires rather than shipping valuable software. This applies to internal systems as well, e.g., data ingestion pipelines that keep breaking shouldn&rsquo;t be seen as normal.</p><p><strong>Q5: What system do you use for prioritising and tracking work across the company?</strong> This includes project management tools, and their use in practice – but any tool can be misused. The important things to look for are that: (1) a system exists; and (2) the system will improve over time.</p><p><strong>Q6: How do you balance paying down tech debt and shipping new features?</strong> <a href=https://blog.codinghorror.com/paying-down-your-technical-debt/ target=_blank rel=noopener>Tech debt is unavoidable</a>, especially in a fast-growing startup. You&rsquo;re looking for an acknowledgement of this fact, along with an understanding that accruing too much tech debt reduces the ability to ship new features. While the concept comes from software engineering, it also applies to product analytics: if the data is a tangled mess and pipelines are unreliable (i.e., <em>data</em> tech debt is high), then analytics can&rsquo;t be trusted to help improve the product.</p><p><strong>Q7: What proportion of the engineering and data staff time is spent on: (1) dealing with bugs and incidents; (2) meetings and admin overheads; and (3) shipping new features?</strong> This question takes Q5 & Q6 from philosophy to practice – <a href=https://www.instagram.com/p/CrdiX-4OBlv/ target=_blank rel=noopener><em>&ldquo;don&rsquo;t tell me your priorities; show me your calendar&rdquo;</em></a>. If individual contributors don&rsquo;t spend most of their time shipping new features (including research into the necessity of these features), there may be too much tech debt or too many overheads.</p><p><strong>Q8: What are the key team rituals (e.g., recurring meetings, sprint planning, demos, postmortems, standups)?</strong> Again, the calendar is a source of insights. For early-stage startups, awareness of the need for rituals may be low, and they may be implemented poorly. For example, I believe that daily synchronous standups are a bad idea for remote teams. A better approach is using a bot that asks everyone for their progress, upcoming tasks, and blockers – <em>and following up to ensure that everyone is on track as a team</em>.</p><p><strong>Q9: What processes are in place for code, design, and architecture reviews?</strong> The depth and time of each process should be proportional to the magnitude and impact of the change. For example, a trivial code improvement in a product that barely has any users can be shipped with a post-commit review. In contrast, migrating a live product to a different database system requires deeper consultation.</p><p><strong>Q10: Are there different processes for internal-facing data products (e.g., custom admin dashboards)?</strong> The unfortunate reality of many organisations is that data analytics and software engineering live in different silos. This can easily happen at small startups as well, with a single analyst working in isolation from product teams. I believe that the ideal situation is that internal-facing data products are treated like the software products that they are. They may not need to be as pretty as external-facing products, but their development should follow processes that ensure high quality, trustworthiness, and satisfaction of user needs.</p><p><strong>Q11: Give an example of how the above items manifest in a big project that was recently completed.</strong> It&rsquo;s often hard to speak of abstract processes. An example of a big project may be the best way of moving from the abstract to the everyday reality of the startup.</p><p><strong>Q12: Are there any gaps in the current processes or changes you&rsquo;d like to introduce?</strong> This probes for a growth mindset. If the answer is <em>no</em>, there&rsquo;s probably something wrong.</p><h2 id=what-if-you-cant-fit-in-all-the-questions>What if you can&rsquo;t fit in all the questions?<a hidden class=anchor aria-hidden=true href=#what-if-you-cant-fit-in-all-the-questions>#</a></h2><p>If you&rsquo;re simply trying to avoid a dysfunctional startup rather than dive deep into development processes, a subset of the questions will suffice:</p><ul><li>Q1 + Q2: Ensure that products are continuously improving based on user feedback.</li><li>Q7: Ensure that time spent by current staff aligns with how you want to spend your time.</li><li>Q11 + Q12: Ensure that leaders are conscious of the need to implement and improve processes.</li></ul><h2 id=data-to-ai-health-beyond-processes>Data-to-AI health beyond processes<a hidden class=anchor aria-hidden=true href=#data-to-ai-health-beyond-processes>#</a></h2><p>This post is part of a series on <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. Previous posts:</p><ul><li><a href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/>Assessing a startup&rsquo;s data-to-AI health: Overview and motivation</a></li><li><a href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/>Business questions to ask before taking a startup data role</a></li><li><a href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/>Probing the People aspects of an early-stage startup</a></li><li><a href=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/>Question startup culture before accepting a data-to-AI role</a></li><li><a href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/>Plumbing, Decisions, and Automation: De-hyping Data & AI</a></li></ul><p><a href=https://yanirseroussi.com/data-to-ai-health-check/>You can download a guide containing all the questions as a PDF</a>. The next area of the health check is Data (<em>finally!</em>). Feedback is always welcome!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share How to avoid startups with poor development processes on x" href="https://x.com/intent/tweet/?text=How%20to%20avoid%20startups%20with%20poor%20development%20processes&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f03%2fhow-to-avoid-startups-with-poor-development-processes%2f&amp;hashtags=business%2ccareer%2cdatastrategy%2csoftwareengineering%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to avoid startups with poor development processes on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f03%2fhow-to-avoid-startups-with-poor-development-processes%2f&amp;title=How%20to%20avoid%20startups%20with%20poor%20development%20processes&amp;summary=How%20to%20avoid%20startups%20with%20poor%20development%20processes&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f03%2fhow-to-avoid-startups-with-poor-development-processes%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to avoid startups with poor development processes on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f03%2fhow-to-avoid-startups-with-poor-development-processes%2f&title=How%20to%20avoid%20startups%20with%20poor%20development%20processes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to avoid startups with poor development processes on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f03%2fhow-to-avoid-startups-with-poor-development-processes%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to avoid startups with poor development processes on whatsapp" href="https://api.whatsapp.com/send?text=How%20to%20avoid%20startups%20with%20poor%20development%20processes%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f03%2fhow-to-avoid-startups-with-poor-development-processes%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to avoid startups with poor development processes on telegram" href="https://telegram.me/share/url?text=How%20to%20avoid%20startups%20with%20poor%20development%20processes&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f03%2fhow-to-avoid-startups-with-poor-development-processes%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share How to avoid startups with poor development processes on ycombinator" href="https://news.ycombinator.com/submitlink?t=How%20to%20avoid%20startups%20with%20poor%20development%20processes&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f03%2fhow-to-avoid-startups-with-poor-development-processes%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/index.html b/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/index.html
index 5feca2972..5443ca0a8 100644
--- a/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/index.html
+++ b/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Startup data health starts with healthy event tracking | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="analytics,business,data science,data strategy,startups"><meta name=description content="Expanding on the startup health check question of tracking Kukuyeva&rsquo;s five business aspects as wide events."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Startup data health starts with healthy event tracking"><meta property="og:description" content="Expanding on the startup health check question of tracking Kukuyeva&rsquo;s five business aspects as wide events."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/"><meta property="og:image" content="https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-06-10T04:00:00+00:00"><meta property="article:modified_time" content="2024-06-10T14:23:12+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines.webp"><meta name=twitter:title content="Startup data health starts with healthy event tracking"><meta name=twitter:description content="Expanding on the startup health check question of tracking Kukuyeva&rsquo;s five business aspects as wide events."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Startup data health starts with healthy event tracking","item":"https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Startup data health starts with healthy event tracking","name":"Startup data health starts with healthy event tracking","description":"Expanding on the startup health check question of tracking Kukuyeva\u0026rsquo;s five business aspects as wide events.","keywords":["analytics","business","data science","data strategy","startups"],"articleBody":"The first question in the Data section of my Data-to-AI Health Check for Startups is:\nDo you track Kukuyeva’s five business aspects as wide events?\nWhole books can be written about this question, so I figured it’s worth its own post. Let’s dive in.\nWhy track and when? In my past decade of work with startups and scaleups, one thing has remained a constant: There are always gaps in the data.\nData gaps may take the form of missing data or low-quality data (from partial to deceptively wrong). The feasibility of any AI/ML automation or data-informed decisions depends on the nature of the data gaps.\nAnother important feature of data tracking is: You can’t go back in time to collect or fix most proprietary data. In other words, historical gaps in the data are there to stay.\nTherefore, the answers to the questions from the heading are:\nWhy track? To make data-informed decisions, and support automation and optimisation of decision-making via AI/ML. The latter comes after the former, as you need to turn your data into trustworthy business metrics before you use algorithms to optimise for those metrics. When do you start tracking? As soon as you have interesting data to track, along with the commitment to ensuring that you’re not just tracking garbage. Typically, you should start once you move from a throwaway prototype to a product that has user traction. What are wide events? An event is essentially a timestamped key-value document. Wide events encourage liberal use of attributes to support downstream exploration.\nHere’s an example from Ivan Burmistrov’s post on the topic, of a wide event for ad impressions at Meta:\n{ \"Timestamp\": \"1707951423\", \"AdId\": \"542508c92f6f47c2916691d6e8551279”, \"UserCountry\": \"US\", \"Placement\": \"mobile_feed\", \"CampaignType\": \"direct_ads\", \"UserOS\": \"Android\", \"OSVersion\": \"14\", \"AppVersion\": \"798de3c28b074df9a24a479ce98302b6\", ... } Burmistrov’s post is worth reading, as it delves deeper into specific examples of how such events are useful for observability and exploration of unknown unknowns: things you weren’t aware would be interesting when the code emitting the event was written.\nHaving been on both the producing and consuming side of such event streams, I have a few thoughts to add:\nA common objection to Burmistrov’s post is the cost of tracking. This depends on implementation specifics – there are ways to balance cost and usefulness of events. Further, if a startup is growing and more events are emitted, revenue and funding should also grow – which supports higher-volume tracking. Further, tracking costs should be compared to the opportunity cost of data gaps and to the cost of engineering time for cost-optimising the tracking system. Many software engineers don’t think in terms of events, as data models for production systems typically reflect the current state of the product rather than its entire history. This is unlikely to radically change, as this sort of data model makes sense in production. To bring engineers along for the event ride, it’s worth getting them to read the classic article about The Log by Jay Kreps. As a step towards data quality assurance by event producers, automated tests should ensure that events are emitted as expected. This is different from traditional logging, which is seen as a side effect of the system that doesn’t require testing. What are Kukuyeva’s five business aspects? I came across Irina Kukuyeva last year, when I was looking for other people who are offering Chief Data Officer engagements. Her website is a treasure trove of advice for founders, investors, data consultants, and others.\nOn the topic of data that should be tracked, she has this to say:\nAfter 10 years of collaborating with companies of all shapes, sizes and industries, I’ve found that all aspects of your business fall into just 5 attributes that you should (legally and ethically) track across your platform/hardware/service and your customers, timestamped at the individual event level:\n“Demographics” of the customer’s/IoT device (e.g., Android/iPhone, tablet, desktop, hardware type), Your app’s/platform’s “demographics” (e.g., release 0.9.3, pricing plan(s)), State(s) of the app’s/platform’s assets (e.g., inventory, sensor(s)), Interactions of all of the above, and Touch-points (e.g. acquisition channels, marketing emails, customer service interactions, sensor maintenance) One thing I found confusing when I first read the article was the use of the word attributes, which commonly refers to specific event attributes. However, now I understand that not all aspects are tracked in each event – it’s a broad overview of what should ideally be tracked across the business.\nImportantly, when thinking of tracking at the logical level, the tools don’t matter. This is exemplified by another Kukuyeva article that advises founders to get started on their data strategy with tools that can be as simple as pen and paper.\nFor examples of how events fold up to analytics and other use cases, see Eventify Everything by Timo Dechau and Activity Schema. However, keep in mind this observation by Misha Panko of Motif Analytics:\nIt took the world decades to develop widely accepted standards for working with relational data and SQL. I believe we are at the early stages of doing the same with event data and sequence analytics. It is starting to simultaneously emerge in many different fields:\neng observability (traces at Datadog, Sumologic, etc) operational research (process mining at Celonis) product analytics (funnels at Amplitude, Mixpanel) As with every new field, there are a lot of different and overlapping terms being suggested and explored at the same time.\nIn short, even though systems for event stream processing like Apache Kafka are now over a decade old, the industry is still figuring out how to best model and use all the event data. No one has all the answers, and your mileage will vary.\nWhat’s a healthy level of tracking? Assuming compliance with relevant data privacy regulations, there’s still a question of what tracking level is healthy. Despite the yes/no phrasing of my original question (“do you track Kukuyeva’s five business aspects as wide events?”), responses are likely to fall on a continuum between 0 and 1:\nObviously unhealthy: No, we don’t track any events. Unrealistically healthy: Yes, we track everything and have high confidence that we know everything we need to know. Health is relative to the stage of the startup and its current goals. If data gaps are blocking growth or likely to start hurting growth in the 6-12 month horizon, then tracking is insufficiently healthy and should be addressed.\nData-to-AI health beyond event tracking This post is part of a series on my Data-to-AI Health Check for Startups. Previous posts:\nAssessing a startup’s data-to-AI health: Overview and motivation Business questions to ask before taking a startup data role Probing the People aspects of an early-stage startup Question startup culture before accepting a data-to-AI role Plumbing, Decisions, and Automation: De-hyping Data \u0026 AI How to avoid startups with poor development processes You can download a guide containing all the questions as a PDF. Next, I’ll go into the other questions in the Data section. Feedback is always welcome!\n","wordCount":"1154","inLanguage":"en","image":"https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines.webp","datePublished":"2024-06-10T04:00:00Z","dateModified":"2024-06-10T14:23:12+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Startup data health starts with healthy event tracking</h1><div class=post-meta><span title='2024-06-10 04:00:00 +0000 UTC'>June 10, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines_hu153a73840b0658eeafb17cde694e5f88_121924_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines_hu153a73840b0658eeafb17cde694e5f88_121924_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines_hu153a73840b0658eeafb17cde694e5f88_121924_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines_hu153a73840b0658eeafb17cde694e5f88_121924_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines.webp alt="Kafka-inspired startup data pipelines" width=1200 height=630></figure><div class=post-content><p>The first question in the Data section of <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a> is:</p><blockquote><p>Do you track Kukuyeva&rsquo;s five business aspects as wide events?</p></blockquote><p>Whole books can be written about this question, so I figured it&rsquo;s worth its own post. Let&rsquo;s dive in.</p><h2 id=why-track-and-when>Why track and when?<a hidden class=anchor aria-hidden=true href=#why-track-and-when>#</a></h2><p>In my past decade of work with startups and scaleups, one thing has remained a constant: <strong>There are always gaps in the data.</strong></p><p>Data gaps may take the form of missing data or low-quality data (from partial to deceptively wrong). The feasibility of any <a href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/>AI/ML automation or data-informed decisions</a> depends on the nature of the data gaps.</p><p>Another important feature of data tracking is: <strong>You can&rsquo;t go back in time to collect or fix most proprietary data.</strong> In other words, historical gaps in the data are there to stay.</p><p>Therefore, the answers to the questions from the heading are:</p><ol><li><strong>Why track?</strong> To make data-informed decisions, and support automation and optimisation of decision-making via AI/ML. The latter comes after the former, as you need to turn your data into trustworthy business metrics <em>before</em> you use algorithms to optimise for those metrics.</li><li><strong>When do you start tracking?</strong> As soon as you have interesting data to track, along with the commitment to ensuring that you&rsquo;re not just tracking garbage. Typically, you should start once you move from a throwaway prototype to a product that has user traction.</li></ol><h2 id=what-are-wide-events>What are wide events?<a hidden class=anchor aria-hidden=true href=#what-are-wide-events>#</a></h2><p>An event is essentially a timestamped key-value document. <em>Wide</em> events encourage liberal use of attributes to support downstream exploration.</p><p>Here&rsquo;s an example from <a href=https://isburmistrov.substack.com/p/all-you-need-is-wide-events-not-metrics target=_blank rel=noopener>Ivan Burmistrov&rsquo;s post on the topic</a>, of a wide event for ad impressions at Meta:</p><pre tabindex=0><code>{
+<meta name=keywords content="analytics,business,data science,data strategy,startups"><meta name=description content="Expanding on the startup health check question of tracking Kukuyeva&rsquo;s five business aspects as wide events."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Startup data health starts with healthy event tracking"><meta property="og:description" content="Expanding on the startup health check question of tracking Kukuyeva&rsquo;s five business aspects as wide events."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/"><meta property="og:image" content="https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-06-10T04:00:00+00:00"><meta property="article:modified_time" content="2024-06-10T14:23:12+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines.webp"><meta name=twitter:title content="Startup data health starts with healthy event tracking"><meta name=twitter:description content="Expanding on the startup health check question of tracking Kukuyeva&rsquo;s five business aspects as wide events."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Startup data health starts with healthy event tracking","item":"https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Startup data health starts with healthy event tracking","name":"Startup data health starts with healthy event tracking","description":"Expanding on the startup health check question of tracking Kukuyeva\u0026rsquo;s five business aspects as wide events.","keywords":["analytics","business","data science","data strategy","startups"],"articleBody":"The first question in the Data section of my Data-to-AI Health Check for Startups is:\nDo you track Kukuyeva’s five business aspects as wide events?\nWhole books can be written about this question, so I figured it’s worth its own post. Let’s dive in.\nWhy track and when? In my past decade of work with startups and scaleups, one thing has remained a constant: There are always gaps in the data.\nData gaps may take the form of missing data or low-quality data (from partial to deceptively wrong). The feasibility of any AI/ML automation or data-informed decisions depends on the nature of the data gaps.\nAnother important feature of data tracking is: You can’t go back in time to collect or fix most proprietary data. In other words, historical gaps in the data are there to stay.\nTherefore, the answers to the questions from the heading are:\nWhy track? To make data-informed decisions, and support automation and optimisation of decision-making via AI/ML. The latter comes after the former, as you need to turn your data into trustworthy business metrics before you use algorithms to optimise for those metrics. When do you start tracking? As soon as you have interesting data to track, along with the commitment to ensuring that you’re not just tracking garbage. Typically, you should start once you move from a throwaway prototype to a product that has user traction. What are wide events? An event is essentially a timestamped key-value document. Wide events encourage liberal use of attributes to support downstream exploration.\nHere’s an example from Ivan Burmistrov’s post on the topic, of a wide event for ad impressions at Meta:\n{ \"Timestamp\": \"1707951423\", \"AdId\": \"542508c92f6f47c2916691d6e8551279”, \"UserCountry\": \"US\", \"Placement\": \"mobile_feed\", \"CampaignType\": \"direct_ads\", \"UserOS\": \"Android\", \"OSVersion\": \"14\", \"AppVersion\": \"798de3c28b074df9a24a479ce98302b6\", ... } Burmistrov’s post is worth reading, as it delves deeper into specific examples of how such events are useful for observability and exploration of unknown unknowns: things you weren’t aware would be interesting when the code emitting the event was written.\nHaving been on both the producing and consuming side of such event streams, I have a few thoughts to add:\nA common objection to Burmistrov’s post is the cost of tracking. This depends on implementation specifics – there are ways to balance cost and usefulness of events. Further, if a startup is growing and more events are emitted, revenue and funding should also grow – which supports higher-volume tracking. Further, tracking costs should be compared to the opportunity cost of data gaps and to the cost of engineering time for cost-optimising the tracking system. Many software engineers don’t think in terms of events, as data models for production systems typically reflect the current state of the product rather than its entire history. This is unlikely to radically change, as this sort of data model makes sense in production. To bring engineers along for the event ride, it’s worth getting them to read the classic article about The Log by Jay Kreps. As a step towards data quality assurance by event producers, automated tests should ensure that events are emitted as expected. This is different from traditional logging, which is seen as a side effect of the system that doesn’t require testing. What are Kukuyeva’s five business aspects? I came across Irina Kukuyeva last year, when I was looking for other people who are offering Chief Data Officer engagements. Her website is a treasure trove of advice for founders, investors, data consultants, and others.\nOn the topic of data that should be tracked, she has this to say:\nAfter 10 years of collaborating with companies of all shapes, sizes and industries, I’ve found that all aspects of your business fall into just 5 attributes that you should (legally and ethically) track across your platform/hardware/service and your customers, timestamped at the individual event level:\n“Demographics” of the customer’s/IoT device (e.g., Android/iPhone, tablet, desktop, hardware type), Your app’s/platform’s “demographics” (e.g., release 0.9.3, pricing plan(s)), State(s) of the app’s/platform’s assets (e.g., inventory, sensor(s)), Interactions of all of the above, and Touch-points (e.g. acquisition channels, marketing emails, customer service interactions, sensor maintenance) One thing I found confusing when I first read the article was the use of the word attributes, which commonly refers to specific event attributes. However, now I understand that not all aspects are tracked in each event – it’s a broad overview of what should ideally be tracked across the business.\nImportantly, when thinking of tracking at the logical level, the tools don’t matter. This is exemplified by another Kukuyeva article that advises founders to get started on their data strategy with tools that can be as simple as pen and paper.\nFor examples of how events fold up to analytics and other use cases, see Eventify Everything by Timo Dechau and Activity Schema. However, keep in mind this observation by Misha Panko of Motif Analytics:\nIt took the world decades to develop widely accepted standards for working with relational data and SQL. I believe we are at the early stages of doing the same with event data and sequence analytics. It is starting to simultaneously emerge in many different fields:\neng observability (traces at Datadog, Sumologic, etc) operational research (process mining at Celonis) product analytics (funnels at Amplitude, Mixpanel) As with every new field, there are a lot of different and overlapping terms being suggested and explored at the same time.\nIn short, even though systems for event stream processing like Apache Kafka are now over a decade old, the industry is still figuring out how to best model and use all the event data. No one has all the answers, and your mileage will vary.\nWhat’s a healthy level of tracking? Assuming compliance with relevant data privacy regulations, there’s still a question of what tracking level is healthy. Despite the yes/no phrasing of my original question (“do you track Kukuyeva’s five business aspects as wide events?”), responses are likely to fall on a continuum between 0 and 1:\nObviously unhealthy: No, we don’t track any events. Unrealistically healthy: Yes, we track everything and have high confidence that we know everything we need to know. Health is relative to the stage of the startup and its current goals. If data gaps are blocking growth or likely to start hurting growth in the 6-12 month horizon, then tracking is insufficiently healthy and should be addressed.\nData-to-AI health beyond event tracking This post is part of a series on my Data-to-AI Health Check for Startups. Previous posts:\nAssessing a startup’s data-to-AI health: Overview and motivation Business questions to ask before taking a startup data role Probing the People aspects of an early-stage startup Question startup culture before accepting a data-to-AI role Plumbing, Decisions, and Automation: De-hyping Data \u0026 AI How to avoid startups with poor development processes You can download a guide containing all the questions as a PDF. Next, I’ll go into the other questions in the Data section. Feedback is always welcome!\n","wordCount":"1154","inLanguage":"en","image":"https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines.webp","datePublished":"2024-06-10T04:00:00Z","dateModified":"2024-06-10T14:23:12+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Startup data health starts with healthy event tracking</h1><div class=post-meta><span title='2024-06-10 04:00:00 +0000 UTC'>June 10, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines_hu153a73840b0658eeafb17cde694e5f88_121924_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines_hu153a73840b0658eeafb17cde694e5f88_121924_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines_hu153a73840b0658eeafb17cde694e5f88_121924_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines_hu153a73840b0658eeafb17cde694e5f88_121924_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/kafka-inspired-startup-data-pipelines.webp alt="Kafka-inspired startup data pipelines" width=1200 height=630></figure><div class=post-content><p>The first question in the Data section of <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a> is:</p><blockquote><p>Do you track Kukuyeva&rsquo;s five business aspects as wide events?</p></blockquote><p>Whole books can be written about this question, so I figured it&rsquo;s worth its own post. Let&rsquo;s dive in.</p><h2 id=why-track-and-when>Why track and when?<a hidden class=anchor aria-hidden=true href=#why-track-and-when>#</a></h2><p>In my past decade of work with startups and scaleups, one thing has remained a constant: <strong>There are always gaps in the data.</strong></p><p>Data gaps may take the form of missing data or low-quality data (from partial to deceptively wrong). The feasibility of any <a href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/>AI/ML automation or data-informed decisions</a> depends on the nature of the data gaps.</p><p>Another important feature of data tracking is: <strong>You can&rsquo;t go back in time to collect or fix most proprietary data.</strong> In other words, historical gaps in the data are there to stay.</p><p>Therefore, the answers to the questions from the heading are:</p><ol><li><strong>Why track?</strong> To make data-informed decisions, and support automation and optimisation of decision-making via AI/ML. The latter comes after the former, as you need to turn your data into trustworthy business metrics <em>before</em> you use algorithms to optimise for those metrics.</li><li><strong>When do you start tracking?</strong> As soon as you have interesting data to track, along with the commitment to ensuring that you&rsquo;re not just tracking garbage. Typically, you should start once you move from a throwaway prototype to a product that has user traction.</li></ol><h2 id=what-are-wide-events>What are wide events?<a hidden class=anchor aria-hidden=true href=#what-are-wide-events>#</a></h2><p>An event is essentially a timestamped key-value document. <em>Wide</em> events encourage liberal use of attributes to support downstream exploration.</p><p>Here&rsquo;s an example from <a href=https://isburmistrov.substack.com/p/all-you-need-is-wide-events-not-metrics target=_blank rel=noopener>Ivan Burmistrov&rsquo;s post on the topic</a>, of a wide event for ad impressions at Meta:</p><pre tabindex=0><code>{
     &#34;Timestamp&#34;: &#34;1707951423&#34;,
     &#34;AdId&#34;: &#34;542508c92f6f47c2916691d6e8551279”,
     &#34;UserCountry&#34;: &#34;US&#34;,
diff --git a/2024/06/17/ai-aint-gonna-save-you-from-bad-data/index.html b/2024/06/17/ai-aint-gonna-save-you-from-bad-data/index.html
index ddbec92e8..66aeaf077 100644
--- a/2024/06/17/ai-aint-gonna-save-you-from-bad-data/index.html
+++ b/2024/06/17/ai-aint-gonna-save-you-from-bad-data/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>AI ain't gonna save you from bad data | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,data science,data strategy,startups"><meta name=description content="Since we&rsquo;re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="AI ain't gonna save you from bad data"><meta property="og:description" content="Since we&rsquo;re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/"><meta property="og:image" content="https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-06-17T02:00:00+00:00"><meta property="article:modified_time" content="2024-06-17T13:13:44+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster.webp"><meta name=twitter:title content="AI ain't gonna save you from bad data"><meta name=twitter:description content="Since we&rsquo;re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"AI ain't gonna save you from bad data","item":"https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"AI ain't gonna save you from bad data","name":"AI ain\u0027t gonna save you from bad data","description":"Since we\u0026rsquo;re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects.","keywords":["artificial intelligence","data science","data strategy","startups"],"articleBody":"Now that we have generative AI, we no longer need to worry about data, right? Well, we’re not quite there yet. On their own, ChatGPT, Gemini, and their friends can’t save us from bad decisions around data collection and modelling, or from poorly-designed metrics.\nWhile we wait for better AI agents to replace data scientists and engineers, I propose we ask a standard set of six questions about the data health of any project. These questions come from my Data-to-AI Health Check for Startups, but they apply anywhere. You can use them as a starting point when joining a new initiative, or to assess the state of an existing project.\nBefore we start, note that the goal is to identify gaps as opportunities for improvement. It’s easy to see data issues as insurmountable, but despair isn’t a viable data strategy. Aim to adopt Stockdale-style optimism when dealing with data:\nYou must never confuse faith that you will prevail in the end – which you can never afford to lose – with the discipline to confront the most brutal facts of your current reality, whatever they might be.\nAs with other posts in the health check series, this post provides a brief explanation for every question.\nLet’s jump in.\nThe questions Q1: Do you track Kukuyeva’s five business aspects as wide events?\nThis question is foundational, as inadequate instrumentation often makes data-informed decisions impossible. Given its importance, I wrote a separate post on this question.\nThe short story is that you need event-based tracking and event data modelling of key aspects of the business and product. Events are essentially timestamped mappings. For example, a customer purchase is an event that should be logged along with metadata on the customer and platform at the time of purchase.\nQ2: Is there data you need that isn’t collected or is inaccessible? What is stopping you from obtaining it?\nWhile Q1 covers high-level event tracking, Q2 brings it down to specific needs.\nSometimes, data can’t be collected due to practical or legal reasons. In some cases, other data can serve as a proxy. For example, companies are typically interested in measuring “customer satisfaction”, but asking directly about satisfaction suffers from a host of problems (e.g., not everyone responds, and the timing and phrasing of questions influence answers). Instead, customer satisfaction can be partly inferred from behaviour like repeat purchases – but that’s also not perfect.\nIn any case, seeking perfection in data is a recipe for disappointment. You should start with business needs and find the data to best address them within a reasonable timeframe.\nQ3: On a scale of 1 to 5, rate the quality of your key datasets. If you’re unsure due to limited observability and quality checks, it’s a 1.\nThis question should be answered by those who are closest to the data: usually data engineers, scientists, analysts, or one of the dozens of other titles data specialists go by. An experienced data specialist would have a subjective sense of data quality, so it’s worth agreeing on quality definitions if the rating is done by multiple people. Generally, a dataset is of high quality if it’s fit for its intended uses in decisions and automation.\nAgain, absolute perfection is impossible: Data is a model of the world, and all models are wrong, but some are useful.\nQ4: On a scale of 1 to 5, what is the confidence of stakeholders in the data and metrics that are used to make business decisions? Explain why.\nLow data quality often leads to low trust and confidence in the outputs of data specialists. However, that’s not always the case. Sometimes, stakeholders may have high confidence in metrics because they’re unaware of underlying data issues. In other cases, confidence is low due to historical reasons: Trust takes time to build – it is a trailing indicator of consistently making and keeping promises. Data specialists are sometimes enamoured with fancy tech and tools, neglecting simple wins that are at the foundation of data’s hierarchy of needs. After decades of hype (from big data through data science to AI agents), I can see why many people treat anything that falls under Data \u0026 AI with suspicion. If you’re a data specialist, serving your customers with relevant trustworthy data can be a rare delight.\nQ5: If you are currently using advanced AI/ML, do you have all the data you need for the models to perform as accurately as required?\nBy advanced AI/ML, I mean fine-tuning or building machine learning models from scratch. This is distinct from basic AI/ML, which relies on third-party models as black boxes. For example, calling a vision API to extract text from images is basic AI/ML. Training a model on your proprietary image data is advanced AI/ML. The latter requires data of sufficient quality for model accuracy, where sufficient accuracy depends on the application.\nImplicit in this question are satisfactory answers to Q1-Q3: You need to be tracking advanced AI/ML performance in the context where it’s used (Q1), have access to the data you need (Q2), and ensure that data quality is fit for advanced AI/ML (Q3). Advanced AI/ML is hard to do well, and failure can erode stakeholder trust (Q4). However, “failure” depends on expectations – AI/ML models are probabilistic, so setting the right expectations is key. For example, as ChatGPT has shown, it’s possible to build a useful consumer product on top of an AI/ML model that is often wrong.\nQ6: If you are planning new advanced AI/ML projects, do you have all the data you need for them? If not, what is the effort required to obtain the data? Is it time-sensitive (e.g., ingesting a public dataset is less time-sensitive than starting to collect timestamped proprietary data)?\nThis is the future-oriented version of Q5. It’s best to think of data and metrics before kicking off advanced AI/ML projects. Further, it’s better to start without AI/ML than over-complicate things early on.\nIn short, user needs should inform project decisions on what to build. These decisions and plans then inform data collection. Don’t make the common mistake of starting with shiny tech as the proverbial hammer that’s looking for nail-shaped problems.\nDon’t forget the opportunities! Data is much like solar energy: It exists even if you don’t capture it, and most of it bounces back to space unused. Harnessed wisely, it can power business decisions and become a differentiator for your product.\nHowever, when working closely with data, it’s easy to feel despair due to the never-ending stream of quality issues and stakeholder requests. I’ve felt this despair myself many times.\nFor me, the cure for data despair comes from shifting focus from gaps to opportunities:\nMap the current state of data, including key gaps. Learn about relevant business opportunities from internal/external customers and industry peers. Create a plan to incrementally improve the state of data, and seize opportunities starting with the lowest-hanging fruit. Execute the plan. Repeat steps 1-4 periodically. Data-to-AI health beyond abstract data This post is part of a series on my Data-to-AI Health Check for Startups. Previous posts:\nAssessing a startup’s data-to-AI health: Overview and motivation Business questions to ask before taking a startup data role Probing the People aspects of an early-stage startup Question startup culture before accepting a data-to-AI role Plumbing, Decisions, and Automation: De-hyping Data \u0026 AI How to avoid startups with poor development processes Startup data health starts with healthy event tracking You can download a guide containing all the questions as a PDF. Next, I’ll go into the questions from the Tech section, which are directly related to how the abstract Data questions manifest in practice. Feedback is always welcome!\n","wordCount":"1275","inLanguage":"en","image":"https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster.webp","datePublished":"2024-06-17T02:00:00Z","dateModified":"2024-06-17T13:13:44+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">AI ain't gonna save you from bad data</h1><div class=post-meta><span title='2024-06-17 02:00:00 +0000 UTC'>June 17, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster_huaf81e2929a1616f106c89955474f8bad_39010_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster_huaf81e2929a1616f106c89955474f8bad_39010_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster_huaf81e2929a1616f106c89955474f8bad_39010_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster_huaf81e2929a1616f106c89955474f8bad_39010_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster.webp alt="bad data monster with a helpless robot" width=1200 height=630></figure><div class=post-content><p>Now that we have generative AI, we no longer need to worry about data, right? Well, we&rsquo;re not quite there yet. On their own, ChatGPT, Gemini, and their friends can&rsquo;t save us from bad decisions around data collection and modelling, or from poorly-designed metrics.</p><p>While we wait for better AI agents to replace data scientists and engineers, I propose we ask a standard set of six questions about the data health of any project. These questions come from <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>, but they apply anywhere. You can use them as a starting point when joining a new initiative, or to assess the state of an existing project.</p><p>Before we start, note that the goal is to identify gaps as opportunities for improvement. It&rsquo;s easy to see data issues as insurmountable, but despair isn&rsquo;t a viable data strategy. Aim to adopt <a href=https://en.wikipedia.org/wiki/James_Stockdale#The_Stockdale_Paradox target=_blank rel=noopener>Stockdale-style optimism</a> when dealing with data:</p><blockquote><p>You must never confuse faith that you will prevail in the end – which you can never afford to lose – with the discipline to confront the most brutal facts of your current reality, whatever they might be.</p></blockquote><p>As with other posts in the health check series, this post provides a brief explanation for every question.</p><p>Let&rsquo;s jump in.</p><h2 id=the-questions>The questions<a hidden class=anchor aria-hidden=true href=#the-questions>#</a></h2><p><strong>Q1: Do you track Kukuyeva&rsquo;s five business aspects as wide events?</strong></p><p>This question is foundational, as inadequate instrumentation often makes data-informed decisions impossible. Given its importance, I wrote <a href=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/>a separate post on this question</a>.</p><p>The short story is that you <em>need</em> event-based tracking and event data modelling of key aspects of the business and product. Events are essentially timestamped mappings. For example, a customer purchase is an event that should be logged along with metadata on the customer and platform at the time of purchase.</p><p><strong>Q2: Is there data you need that isn&rsquo;t collected or is inaccessible? What is stopping you from obtaining it?</strong></p><p>While Q1 covers high-level event tracking, Q2 brings it down to specific needs.</p><p>Sometimes, data can&rsquo;t be collected due to practical or legal reasons. In some cases, other data can serve as a proxy. For example, companies are typically interested in measuring &ldquo;customer satisfaction&rdquo;, but asking directly about satisfaction suffers from a host of problems (e.g., not everyone responds, and the timing and phrasing of questions influence answers). Instead, customer satisfaction can be partly inferred from behaviour like repeat purchases – but that&rsquo;s also not perfect.</p><p>In any case, seeking perfection in data is a recipe for disappointment. You should start with business needs and find the data to best address them within a reasonable timeframe.</p><p><strong>Q3: On a scale of 1 to 5, rate the quality of your key datasets. If you&rsquo;re unsure due to limited observability and quality checks, it&rsquo;s a 1.</strong></p><p>This question should be answered by those who are closest to the data: usually data engineers, scientists, analysts, or one of the dozens of other titles data specialists go by. An experienced data specialist would have a subjective sense of data quality, so it&rsquo;s worth agreeing on <a href=https://en.wikipedia.org/wiki/Data_quality target=_blank rel=noopener>quality definitions</a> if the rating is done by multiple people. Generally, a dataset is of high quality if it&rsquo;s fit for its intended uses in <a href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/>decisions and automation</a>.</p><p>Again, absolute perfection is impossible: Data is a model of the world, and <a href=https://en.wikipedia.org/wiki/All_models_are_wrong target=_blank rel=noopener><em>all models are wrong, but some are useful</em></a>.</p><p><strong>Q4: On a scale of 1 to 5, what is the confidence of stakeholders in the data and metrics that are used to make business decisions? Explain why.</strong></p><p>Low data quality often leads to low trust and confidence in the outputs of data specialists. However, that&rsquo;s not always the case. Sometimes, stakeholders may have high confidence in metrics because they&rsquo;re unaware of underlying data issues. In other cases, confidence is low due to historical reasons: Trust takes time to build – it is a trailing indicator of consistently making and keeping promises. Data specialists are sometimes enamoured with fancy tech and tools, neglecting simple wins that are at the foundation of <a href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/>data&rsquo;s hierarchy of needs</a>. After decades of hype (from big data through data science to AI agents), I can see why many people treat anything that falls under Data & AI with suspicion. If you&rsquo;re a data specialist, serving your customers with relevant trustworthy data can be a rare delight.</p><p><strong>Q5: If you are currently using advanced AI/ML, do you have all the data you need for the models to perform as accurately as required?</strong></p><p>By advanced AI/ML, I mean fine-tuning or building machine learning models from scratch. This is distinct from basic AI/ML, which relies on third-party models as black boxes. For example, calling a vision API to extract text from images is basic AI/ML. Training a model on your proprietary image data is advanced AI/ML. The latter requires data of sufficient quality for model accuracy, where sufficient accuracy depends on the application.</p><p>Implicit in this question are satisfactory answers to Q1-Q3: You need to be tracking advanced AI/ML performance in the context where it&rsquo;s used (Q1), have access to the data you need (Q2), and ensure that data quality is fit for advanced AI/ML (Q3). Advanced AI/ML is hard to do well, and failure can erode stakeholder trust (Q4). However, &ldquo;failure&rdquo; depends on expectations – AI/ML models are probabilistic, so setting the right expectations is key. For example, as ChatGPT has shown, it&rsquo;s possible to build a useful consumer product on top of an AI/ML model that is often wrong.</p><p><strong>Q6: If you are planning new advanced AI/ML projects, do you have all the data you need for them? If not, what is the effort required to obtain the data? Is it time-sensitive (e.g., ingesting a public dataset is less time-sensitive than starting to collect timestamped proprietary data)?</strong></p><p>This is the future-oriented version of Q5. It&rsquo;s best to think of data and metrics <em>before</em> kicking off advanced AI/ML projects. Further, <a href=https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/>it&rsquo;s better to start without AI/ML than over-complicate things early on</a>.</p><p>In short, user needs should inform project decisions on what to build. These decisions and plans then inform data collection. Don&rsquo;t make the common mistake of starting with shiny tech as the proverbial hammer that&rsquo;s looking for nail-shaped problems.</p><h2 id=dont-forget-the-opportunities>Don&rsquo;t forget the opportunities!<a hidden class=anchor aria-hidden=true href=#dont-forget-the-opportunities>#</a></h2><p>Data is much like solar energy: It exists even if you don&rsquo;t capture it, and most of it bounces back to space unused. Harnessed wisely, it can power business decisions and become a differentiator for your product.</p><p>However, when working closely with data, it&rsquo;s easy to feel despair due to the never-ending stream of quality issues and stakeholder requests. I&rsquo;ve felt this despair myself many times.</p><p>For me, the cure for data despair comes from shifting focus from gaps to opportunities:</p><ol><li>Map the current state of data, including key gaps.</li><li>Learn about relevant business opportunities from internal/external customers and industry peers.</li><li>Create a plan to incrementally improve the state of data, and seize opportunities starting with the lowest-hanging fruit.</li><li>Execute the plan.</li><li>Repeat steps 1-4 periodically.</li></ol><h2 id=data-to-ai-health-beyond-abstract-data>Data-to-AI health beyond abstract data<a hidden class=anchor aria-hidden=true href=#data-to-ai-health-beyond-abstract-data>#</a></h2><p>This post is part of a series on <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. Previous posts:</p><ul><li><a href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/>Assessing a startup&rsquo;s data-to-AI health: Overview and motivation</a></li><li><a href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/>Business questions to ask before taking a startup data role</a></li><li><a href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/>Probing the People aspects of an early-stage startup</a></li><li><a href=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/>Question startup culture before accepting a data-to-AI role</a></li><li><a href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/>Plumbing, Decisions, and Automation: De-hyping Data & AI</a></li><li><a href=https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/>How to avoid startups with poor development processes</a></li><li><a href=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/>Startup data health starts with healthy event tracking</a></li></ul><p><a href=https://yanirseroussi.com/data-to-ai-health-check/>You can download a guide containing all the questions as a PDF</a>. Next, I&rsquo;ll go into the questions from the Tech section, which are directly related to how the abstract Data questions manifest in practice. Feedback is always welcome!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share AI ain't gonna save you from bad data on x" href="https://x.com/intent/tweet/?text=AI%20ain%27t%20gonna%20save%20you%20from%20bad%20data&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f17%2fai-aint-gonna-save-you-from-bad-data%2f&amp;hashtags=artificialintelligence%2cdatascience%2cdatastrategy%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI ain't gonna save you from bad data on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f17%2fai-aint-gonna-save-you-from-bad-data%2f&amp;title=AI%20ain%27t%20gonna%20save%20you%20from%20bad%20data&amp;summary=AI%20ain%27t%20gonna%20save%20you%20from%20bad%20data&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f17%2fai-aint-gonna-save-you-from-bad-data%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI ain't gonna save you from bad data on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f17%2fai-aint-gonna-save-you-from-bad-data%2f&title=AI%20ain%27t%20gonna%20save%20you%20from%20bad%20data"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI ain't gonna save you from bad data on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f17%2fai-aint-gonna-save-you-from-bad-data%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI ain't gonna save you from bad data on whatsapp" href="https://api.whatsapp.com/send?text=AI%20ain%27t%20gonna%20save%20you%20from%20bad%20data%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f17%2fai-aint-gonna-save-you-from-bad-data%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI ain't gonna save you from bad data on telegram" href="https://telegram.me/share/url?text=AI%20ain%27t%20gonna%20save%20you%20from%20bad%20data&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f17%2fai-aint-gonna-save-you-from-bad-data%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI ain't gonna save you from bad data on ycombinator" href="https://news.ycombinator.com/submitlink?t=AI%20ain%27t%20gonna%20save%20you%20from%20bad%20data&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f17%2fai-aint-gonna-save-you-from-bad-data%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="artificial intelligence,data science,data strategy,startups"><meta name=description content="Since we&rsquo;re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="AI ain't gonna save you from bad data"><meta property="og:description" content="Since we&rsquo;re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/"><meta property="og:image" content="https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-06-17T02:00:00+00:00"><meta property="article:modified_time" content="2024-06-17T13:13:44+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster.webp"><meta name=twitter:title content="AI ain't gonna save you from bad data"><meta name=twitter:description content="Since we&rsquo;re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"AI ain't gonna save you from bad data","item":"https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"AI ain't gonna save you from bad data","name":"AI ain\u0027t gonna save you from bad data","description":"Since we\u0026rsquo;re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects.","keywords":["artificial intelligence","data science","data strategy","startups"],"articleBody":"Now that we have generative AI, we no longer need to worry about data, right? Well, we’re not quite there yet. On their own, ChatGPT, Gemini, and their friends can’t save us from bad decisions around data collection and modelling, or from poorly-designed metrics.\nWhile we wait for better AI agents to replace data scientists and engineers, I propose we ask a standard set of six questions about the data health of any project. These questions come from my Data-to-AI Health Check for Startups, but they apply anywhere. You can use them as a starting point when joining a new initiative, or to assess the state of an existing project.\nBefore we start, note that the goal is to identify gaps as opportunities for improvement. It’s easy to see data issues as insurmountable, but despair isn’t a viable data strategy. Aim to adopt Stockdale-style optimism when dealing with data:\nYou must never confuse faith that you will prevail in the end – which you can never afford to lose – with the discipline to confront the most brutal facts of your current reality, whatever they might be.\nAs with other posts in the health check series, this post provides a brief explanation for every question.\nLet’s jump in.\nThe questions Q1: Do you track Kukuyeva’s five business aspects as wide events?\nThis question is foundational, as inadequate instrumentation often makes data-informed decisions impossible. Given its importance, I wrote a separate post on this question.\nThe short story is that you need event-based tracking and event data modelling of key aspects of the business and product. Events are essentially timestamped mappings. For example, a customer purchase is an event that should be logged along with metadata on the customer and platform at the time of purchase.\nQ2: Is there data you need that isn’t collected or is inaccessible? What is stopping you from obtaining it?\nWhile Q1 covers high-level event tracking, Q2 brings it down to specific needs.\nSometimes, data can’t be collected due to practical or legal reasons. In some cases, other data can serve as a proxy. For example, companies are typically interested in measuring “customer satisfaction”, but asking directly about satisfaction suffers from a host of problems (e.g., not everyone responds, and the timing and phrasing of questions influence answers). Instead, customer satisfaction can be partly inferred from behaviour like repeat purchases – but that’s also not perfect.\nIn any case, seeking perfection in data is a recipe for disappointment. You should start with business needs and find the data to best address them within a reasonable timeframe.\nQ3: On a scale of 1 to 5, rate the quality of your key datasets. If you’re unsure due to limited observability and quality checks, it’s a 1.\nThis question should be answered by those who are closest to the data: usually data engineers, scientists, analysts, or one of the dozens of other titles data specialists go by. An experienced data specialist would have a subjective sense of data quality, so it’s worth agreeing on quality definitions if the rating is done by multiple people. Generally, a dataset is of high quality if it’s fit for its intended uses in decisions and automation.\nAgain, absolute perfection is impossible: Data is a model of the world, and all models are wrong, but some are useful.\nQ4: On a scale of 1 to 5, what is the confidence of stakeholders in the data and metrics that are used to make business decisions? Explain why.\nLow data quality often leads to low trust and confidence in the outputs of data specialists. However, that’s not always the case. Sometimes, stakeholders may have high confidence in metrics because they’re unaware of underlying data issues. In other cases, confidence is low due to historical reasons: Trust takes time to build – it is a trailing indicator of consistently making and keeping promises. Data specialists are sometimes enamoured with fancy tech and tools, neglecting simple wins that are at the foundation of data’s hierarchy of needs. After decades of hype (from big data through data science to AI agents), I can see why many people treat anything that falls under Data \u0026 AI with suspicion. If you’re a data specialist, serving your customers with relevant trustworthy data can be a rare delight.\nQ5: If you are currently using advanced AI/ML, do you have all the data you need for the models to perform as accurately as required?\nBy advanced AI/ML, I mean fine-tuning or building machine learning models from scratch. This is distinct from basic AI/ML, which relies on third-party models as black boxes. For example, calling a vision API to extract text from images is basic AI/ML. Training a model on your proprietary image data is advanced AI/ML. The latter requires data of sufficient quality for model accuracy, where sufficient accuracy depends on the application.\nImplicit in this question are satisfactory answers to Q1-Q3: You need to be tracking advanced AI/ML performance in the context where it’s used (Q1), have access to the data you need (Q2), and ensure that data quality is fit for advanced AI/ML (Q3). Advanced AI/ML is hard to do well, and failure can erode stakeholder trust (Q4). However, “failure” depends on expectations – AI/ML models are probabilistic, so setting the right expectations is key. For example, as ChatGPT has shown, it’s possible to build a useful consumer product on top of an AI/ML model that is often wrong.\nQ6: If you are planning new advanced AI/ML projects, do you have all the data you need for them? If not, what is the effort required to obtain the data? Is it time-sensitive (e.g., ingesting a public dataset is less time-sensitive than starting to collect timestamped proprietary data)?\nThis is the future-oriented version of Q5. It’s best to think of data and metrics before kicking off advanced AI/ML projects. Further, it’s better to start without AI/ML than over-complicate things early on.\nIn short, user needs should inform project decisions on what to build. These decisions and plans then inform data collection. Don’t make the common mistake of starting with shiny tech as the proverbial hammer that’s looking for nail-shaped problems.\nDon’t forget the opportunities! Data is much like solar energy: It exists even if you don’t capture it, and most of it bounces back to space unused. Harnessed wisely, it can power business decisions and become a differentiator for your product.\nHowever, when working closely with data, it’s easy to feel despair due to the never-ending stream of quality issues and stakeholder requests. I’ve felt this despair myself many times.\nFor me, the cure for data despair comes from shifting focus from gaps to opportunities:\nMap the current state of data, including key gaps. Learn about relevant business opportunities from internal/external customers and industry peers. Create a plan to incrementally improve the state of data, and seize opportunities starting with the lowest-hanging fruit. Execute the plan. Repeat steps 1-4 periodically. Data-to-AI health beyond abstract data This post is part of a series on my Data-to-AI Health Check for Startups. Previous posts:\nAssessing a startup’s data-to-AI health: Overview and motivation Business questions to ask before taking a startup data role Probing the People aspects of an early-stage startup Question startup culture before accepting a data-to-AI role Plumbing, Decisions, and Automation: De-hyping Data \u0026 AI How to avoid startups with poor development processes Startup data health starts with healthy event tracking You can download a guide containing all the questions as a PDF. Next, I’ll go into the questions from the Tech section, which are directly related to how the abstract Data questions manifest in practice. Feedback is always welcome!\n","wordCount":"1275","inLanguage":"en","image":"https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster.webp","datePublished":"2024-06-17T02:00:00Z","dateModified":"2024-06-17T13:13:44+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">AI ain't gonna save you from bad data</h1><div class=post-meta><span title='2024-06-17 02:00:00 +0000 UTC'>June 17, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster_huaf81e2929a1616f106c89955474f8bad_39010_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster_huaf81e2929a1616f106c89955474f8bad_39010_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster_huaf81e2929a1616f106c89955474f8bad_39010_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster_huaf81e2929a1616f106c89955474f8bad_39010_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/helpless-robot-with-data-monster.webp alt="bad data monster with a helpless robot" width=1200 height=630></figure><div class=post-content><p>Now that we have generative AI, we no longer need to worry about data, right? Well, we&rsquo;re not quite there yet. On their own, ChatGPT, Gemini, and their friends can&rsquo;t save us from bad decisions around data collection and modelling, or from poorly-designed metrics.</p><p>While we wait for better AI agents to replace data scientists and engineers, I propose we ask a standard set of six questions about the data health of any project. These questions come from <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>, but they apply anywhere. You can use them as a starting point when joining a new initiative, or to assess the state of an existing project.</p><p>Before we start, note that the goal is to identify gaps as opportunities for improvement. It&rsquo;s easy to see data issues as insurmountable, but despair isn&rsquo;t a viable data strategy. Aim to adopt <a href=https://en.wikipedia.org/wiki/James_Stockdale#The_Stockdale_Paradox target=_blank rel=noopener>Stockdale-style optimism</a> when dealing with data:</p><blockquote><p>You must never confuse faith that you will prevail in the end – which you can never afford to lose – with the discipline to confront the most brutal facts of your current reality, whatever they might be.</p></blockquote><p>As with other posts in the health check series, this post provides a brief explanation for every question.</p><p>Let&rsquo;s jump in.</p><h2 id=the-questions>The questions<a hidden class=anchor aria-hidden=true href=#the-questions>#</a></h2><p><strong>Q1: Do you track Kukuyeva&rsquo;s five business aspects as wide events?</strong></p><p>This question is foundational, as inadequate instrumentation often makes data-informed decisions impossible. Given its importance, I wrote <a href=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/>a separate post on this question</a>.</p><p>The short story is that you <em>need</em> event-based tracking and event data modelling of key aspects of the business and product. Events are essentially timestamped mappings. For example, a customer purchase is an event that should be logged along with metadata on the customer and platform at the time of purchase.</p><p><strong>Q2: Is there data you need that isn&rsquo;t collected or is inaccessible? What is stopping you from obtaining it?</strong></p><p>While Q1 covers high-level event tracking, Q2 brings it down to specific needs.</p><p>Sometimes, data can&rsquo;t be collected due to practical or legal reasons. In some cases, other data can serve as a proxy. For example, companies are typically interested in measuring &ldquo;customer satisfaction&rdquo;, but asking directly about satisfaction suffers from a host of problems (e.g., not everyone responds, and the timing and phrasing of questions influence answers). Instead, customer satisfaction can be partly inferred from behaviour like repeat purchases – but that&rsquo;s also not perfect.</p><p>In any case, seeking perfection in data is a recipe for disappointment. You should start with business needs and find the data to best address them within a reasonable timeframe.</p><p><strong>Q3: On a scale of 1 to 5, rate the quality of your key datasets. If you&rsquo;re unsure due to limited observability and quality checks, it&rsquo;s a 1.</strong></p><p>This question should be answered by those who are closest to the data: usually data engineers, scientists, analysts, or one of the dozens of other titles data specialists go by. An experienced data specialist would have a subjective sense of data quality, so it&rsquo;s worth agreeing on <a href=https://en.wikipedia.org/wiki/Data_quality target=_blank rel=noopener>quality definitions</a> if the rating is done by multiple people. Generally, a dataset is of high quality if it&rsquo;s fit for its intended uses in <a href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/>decisions and automation</a>.</p><p>Again, absolute perfection is impossible: Data is a model of the world, and <a href=https://en.wikipedia.org/wiki/All_models_are_wrong target=_blank rel=noopener><em>all models are wrong, but some are useful</em></a>.</p><p><strong>Q4: On a scale of 1 to 5, what is the confidence of stakeholders in the data and metrics that are used to make business decisions? Explain why.</strong></p><p>Low data quality often leads to low trust and confidence in the outputs of data specialists. However, that&rsquo;s not always the case. Sometimes, stakeholders may have high confidence in metrics because they&rsquo;re unaware of underlying data issues. In other cases, confidence is low due to historical reasons: Trust takes time to build – it is a trailing indicator of consistently making and keeping promises. Data specialists are sometimes enamoured with fancy tech and tools, neglecting simple wins that are at the foundation of <a href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/>data&rsquo;s hierarchy of needs</a>. After decades of hype (from big data through data science to AI agents), I can see why many people treat anything that falls under Data & AI with suspicion. If you&rsquo;re a data specialist, serving your customers with relevant trustworthy data can be a rare delight.</p><p><strong>Q5: If you are currently using advanced AI/ML, do you have all the data you need for the models to perform as accurately as required?</strong></p><p>By advanced AI/ML, I mean fine-tuning or building machine learning models from scratch. This is distinct from basic AI/ML, which relies on third-party models as black boxes. For example, calling a vision API to extract text from images is basic AI/ML. Training a model on your proprietary image data is advanced AI/ML. The latter requires data of sufficient quality for model accuracy, where sufficient accuracy depends on the application.</p><p>Implicit in this question are satisfactory answers to Q1-Q3: You need to be tracking advanced AI/ML performance in the context where it&rsquo;s used (Q1), have access to the data you need (Q2), and ensure that data quality is fit for advanced AI/ML (Q3). Advanced AI/ML is hard to do well, and failure can erode stakeholder trust (Q4). However, &ldquo;failure&rdquo; depends on expectations – AI/ML models are probabilistic, so setting the right expectations is key. For example, as ChatGPT has shown, it&rsquo;s possible to build a useful consumer product on top of an AI/ML model that is often wrong.</p><p><strong>Q6: If you are planning new advanced AI/ML projects, do you have all the data you need for them? If not, what is the effort required to obtain the data? Is it time-sensitive (e.g., ingesting a public dataset is less time-sensitive than starting to collect timestamped proprietary data)?</strong></p><p>This is the future-oriented version of Q5. It&rsquo;s best to think of data and metrics <em>before</em> kicking off advanced AI/ML projects. Further, <a href=https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/>it&rsquo;s better to start without AI/ML than over-complicate things early on</a>.</p><p>In short, user needs should inform project decisions on what to build. These decisions and plans then inform data collection. Don&rsquo;t make the common mistake of starting with shiny tech as the proverbial hammer that&rsquo;s looking for nail-shaped problems.</p><h2 id=dont-forget-the-opportunities>Don&rsquo;t forget the opportunities!<a hidden class=anchor aria-hidden=true href=#dont-forget-the-opportunities>#</a></h2><p>Data is much like solar energy: It exists even if you don&rsquo;t capture it, and most of it bounces back to space unused. Harnessed wisely, it can power business decisions and become a differentiator for your product.</p><p>However, when working closely with data, it&rsquo;s easy to feel despair due to the never-ending stream of quality issues and stakeholder requests. I&rsquo;ve felt this despair myself many times.</p><p>For me, the cure for data despair comes from shifting focus from gaps to opportunities:</p><ol><li>Map the current state of data, including key gaps.</li><li>Learn about relevant business opportunities from internal/external customers and industry peers.</li><li>Create a plan to incrementally improve the state of data, and seize opportunities starting with the lowest-hanging fruit.</li><li>Execute the plan.</li><li>Repeat steps 1-4 periodically.</li></ol><h2 id=data-to-ai-health-beyond-abstract-data>Data-to-AI health beyond abstract data<a hidden class=anchor aria-hidden=true href=#data-to-ai-health-beyond-abstract-data>#</a></h2><p>This post is part of a series on <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. Previous posts:</p><ul><li><a href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/>Assessing a startup&rsquo;s data-to-AI health: Overview and motivation</a></li><li><a href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/>Business questions to ask before taking a startup data role</a></li><li><a href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/>Probing the People aspects of an early-stage startup</a></li><li><a href=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/>Question startup culture before accepting a data-to-AI role</a></li><li><a href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/>Plumbing, Decisions, and Automation: De-hyping Data & AI</a></li><li><a href=https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/>How to avoid startups with poor development processes</a></li><li><a href=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/>Startup data health starts with healthy event tracking</a></li></ul><p><a href=https://yanirseroussi.com/data-to-ai-health-check/>You can download a guide containing all the questions as a PDF</a>. Next, I&rsquo;ll go into the questions from the Tech section, which are directly related to how the abstract Data questions manifest in practice. Feedback is always welcome!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share AI ain't gonna save you from bad data on x" href="https://x.com/intent/tweet/?text=AI%20ain%27t%20gonna%20save%20you%20from%20bad%20data&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f17%2fai-aint-gonna-save-you-from-bad-data%2f&amp;hashtags=artificialintelligence%2cdatascience%2cdatastrategy%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI ain't gonna save you from bad data on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f17%2fai-aint-gonna-save-you-from-bad-data%2f&amp;title=AI%20ain%27t%20gonna%20save%20you%20from%20bad%20data&amp;summary=AI%20ain%27t%20gonna%20save%20you%20from%20bad%20data&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f17%2fai-aint-gonna-save-you-from-bad-data%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI ain't gonna save you from bad data on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f17%2fai-aint-gonna-save-you-from-bad-data%2f&title=AI%20ain%27t%20gonna%20save%20you%20from%20bad%20data"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI ain't gonna save you from bad data on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f17%2fai-aint-gonna-save-you-from-bad-data%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI ain't gonna save you from bad data on whatsapp" href="https://api.whatsapp.com/send?text=AI%20ain%27t%20gonna%20save%20you%20from%20bad%20data%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f17%2fai-aint-gonna-save-you-from-bad-data%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI ain't gonna save you from bad data on telegram" href="https://telegram.me/share/url?text=AI%20ain%27t%20gonna%20save%20you%20from%20bad%20data&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f17%2fai-aint-gonna-save-you-from-bad-data%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share AI ain't gonna save you from bad data on ycombinator" href="https://news.ycombinator.com/submitlink?t=AI%20ain%27t%20gonna%20save%20you%20from%20bad%20data&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f17%2fai-aint-gonna-save-you-from-bad-data%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/index.html b/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/index.html
index 969d05c65..bc8502201 100644
--- a/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/index.html
+++ b/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Is your tech stack ready for data-intensive applications? | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="analytics,artificial intelligence,data science,data strategy,machine learning,software engineering,startups"><meta name=description content="Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Is your tech stack ready for data-intensive applications?"><meta property="og:description" content="Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/"><meta property="og:image" content="https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-06-24T02:00:00+00:00"><meta property="article:modified_time" content="2024-06-24T14:12:50+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack.webp"><meta name=twitter:title content="Is your tech stack ready for data-intensive applications?"><meta name=twitter:description content="Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Is your tech stack ready for data-intensive applications?","item":"https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Is your tech stack ready for data-intensive applications?","name":"Is your tech stack ready for data-intensive applications?","description":"Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.","keywords":["analytics","artificial intelligence","data science","data strategy","machine learning","software engineering","startups"],"articleBody":"Data-intensive projects fail when you treat them like traditional software projects. But they also fail when you don’t apply best practices from software engineering.\nWhy?\nBecause data-intensive systems are made of data, and also made of software. Therefore:\ndata changes can lead to failures; and software changes can lead to failures. In traditional software systems, you fully control the changes. Your software doesn’t change unexpectedly.\nIn data-intensive systems, you cede control to the data. The data changes constantly, and it affects the behaviour of your system.\nTo succeed, you need to manage both the data and software aspects of your systems. This successful management is the essence of the questions from the Tech section of my Data-to-AI Health Check for Startups. This post presents the questions along with guidance on what constitutes healthy answers.\nWhat do I mean by data intensity? For the last few months, I have set my LinkedIn tagline to “helping startups ship data-intensive solutions (AI/ML for climate/nature tech)”. I landed on it after a bit of a struggle with succinctly defining exactly what it is I do.\nThe problem is that after over a decade of “data” roles, I don’t see the field of AI/ML (artificial intelligence and machine learning) as a sanctified sphere that’s separate from real-world data and humans. Further, while business intelligence (aka analytics) is seen by some as less “sexy” than AI/ML, I see it as a different lens of using data to drive business outcomes. Essentially, it all comes down to plumbing, decisions, and automation.\nIn the days of the Big Data hype, much attention was given to the three Vs of data: Volume, Velocity, and Variety – what flows through the plumbing. To me, data intensity goes beyond the three Vs. This is how I define it in the first section of my Data-to-AI Health Check:\nHigh data intensity typically requires low-latency processing of large volumes of data with more than one database server. With high intensity, data processing issues noticeably affect key business metrics.\nThat is, in data-intensive settings, data issues affect decisions and automation in a way that hurts the business.\nA couple of examples may help:\nLow intensity: A dashboard that doesn’t contain any actionable metrics. If the metrics change due to bugs in the data processing, it doesn’t affect decisions. High intensity: An ad-serving platform that personalises ads in real time based on numerous data points. If any model or system breaks, millions of dollars may be lost. In short, the higher the data intensity, the more the flow of data affects the bottom line.\nUnderstanding tech stacks and lifecycles At 15 questions, the Tech section of my Data-to-AI Health Check for Startups is long and deep. To keep this post digestible, I won’t go into every question. Instead, I’ve grouped the questions by theme.\nFirst up, on the tech stacks and lifecycles:\nQ1: Provide an architecture diagram for your tech systems (product and data stacks), including first-party and third-party tools and databases. If a diagram doesn’t exist, an ad hoc drawing would work as well. Q2: Zooming in on data stacks, what tools and pipelines do you use for the data engineering lifecycles (generation, storage, ingestion, transformation, and serving), and downstream uses (analytics, AI/ML, and reverse ETL)? Q3: Zooming in further on the downstream uses of analytics and AI/ML, what systems, processes, and tools do you use to manage their lifecycles (discovery, data preparation, model engineering, deployment, monitoring, and maintenance)? Give specific project examples. Q4: Are there any tech choices you regret? Why? Q5: Are there any new tools you want to introduce to your stack? Why? To an extent, tech stacks and lifecycles follow the Anna Karenina principle: All healthy stacks are alike; each unhealthy stack is unhealthy in its own way.\nBy asking for their descriptions, I’m aiming to uncover gaps and opportunities.\nOften, some gaps are known to the people in charge, but they haven’t been explicitly discussed. This is especially common in startups, where competing priorities and resource constraints require compromising on scope and quality to fuel growth. In addition, it’s impossible for small startups to have all the relevant experts on the founding team, so best practices aren’t followed due to ignorance rather than due to intentional compromises made to move fast. However, a lack of awareness of best practices can often lead to the startup moving too slowly.\nTwo concrete examples:\nmany people outside the data world are unaware of recent advances in tooling for management of data transformations (dbt and its competitors), and practitioners who’ve only built ML models in academia rarely appreciate the complexity of running ML in production (MLOps is much more than ML). Beyond gaps that may be exposed by Q1-Q3, explicitly asking about regrettable and future tech choices (Q4 \u0026 Q5) helps surface evidence of an overreliance on unproven or exotic tech (aka wasted innovation tokens) and an underreliance on proven tech (aka reinvention of wheels). This is especially common with inexperienced operators who are too excited about playing with shiny tools. Use of unproven tech should be reserved to the cases where it confers a competitive advantage (e.g., being first to market with the latest AI advances).\nBasic quality assurance and delivery The next set of questions covers what I consider to be the basics of quality assurance and continuous delivery:\nQ6: How do you test product code and infrastructure setup? How good is the coverage (formally – percentage of statements covered, and conceptually – confidence from 1 to 5 that tests capture faults prior to deployment)? Q7: Do all tests run automatically on every version of the code? Q8: Are deployments done as a single automated step (e.g., push new containers to production when the main branch is updated)? Q9: How faithful are development, testing, and staging environments to the production setup? Are there gaps that can be feasibly addressed? If so, what is stopping you from addressing them? As I’m writing this in 2024, all the tooling exists to set things up with solid testing and deployment processes – and it’s constantly getting easier. The only place where such processes may be skipped is in throwaway prototypes, where testing unnecessarily slows things down.\nBeing a startup is also not an excuse. As Martin Fowler pointed out years ago, the internal quality of software doesn’t incur a cost. That is, by implementing solid systems and processes for automated testing and deployment, teams move faster. Teams that cut corners on internal quality may move faster in the very short term, but typically get overtaken by their higher-internal-quality counterparts within weeks.\nNo startup aims to be around only for a few weeks, so investing in internal quality is key to tech health.\nIn Fowler’s words:\nNeglecting internal quality leads to rapid build up of cruft This cruft slows down feature development Even a great team produces cruft, but by keeping internal quality high, is able to keep it under control High internal quality keeps cruft to a minimum, allowing a team to add features with less effort, time, and cost Unfortunately, some software engineers never learn this lesson. Further, data professionals that don’t have a software background are even less likely to be exposed to the importance of internal quality and how it can be enforced.\nThat said, it’s never too late to learn and improve. This is key to avoiding failure modes that arise in data projects when best practices from software engineering aren’t applied.\nSpecific data-intensive failure modes The next set of questions probes for failure modes that are specific to data-intensive work (data engineering, analytics, AI/ML, etc.):\nQ10: Do you apply the same standards of testing and deploying product code to data? For example, is there untested SQL code hidden in dashboarding tools or the database layer, or is SQL treated like core product code (tracked in source control with isolated testing)? Q11: How are schema changes managed and tested in each data system? Q12: Do you rely on notebooks for production data code? If so, how do you ensure that notebook code meets the same quality standards as core product code (especially around testing and change management)? Q13: Do advanced AI/ML projects meet your performance expectations? If not, do you know how to improve performance without data changes? Data-intensive work is essentially about building models with software:\nRaw data is a model of real-world entities and events, expressed in database schemas (even “schemaless” databases have a schema – it’s just unbounded). Dashboards present models of metrics that originate in raw data, with the goal of informing decisions. AI/ML models are essentially complex data transformations, e.g., from a matrix of pixels to a probability that the image modelled by the pixels is of a cat or a dog. Due to historical and practical reasons, much of this modelling work is done by people with no training in software engineering. While the industry is maturing, Q10-13 often expose gaps. The ideal answer to each question is that all models are fully tested and managed – just like software, but with extra care for the complexity introduced by data.\nMaintaining long-term success Finally, the last two questions cover monitoring and maintenance:\nQ14: On a scale of 1 to 5, how confident are you in detecting and addressing issues in production (including product, infra, data, and ML observability 1.0 \u0026 2.0)? Do you have action plans to increase your level of confidence? Q15: What DevOps, DataOps, and MLOps practices do you follow that weren’t covered above? Are there known gaps and plans to address them? Even if a data-intensive project is considered “done”, it still changes in production due to its dependence on data. The degree of likely change varies by project, but it needs to be actively managed for long-term success.\nData-to-AI health beyond the tech This post is part of a series on my Data-to-AI Health Check for Startups. Previous posts:\nAssessing a startup’s data-to-AI health: Overview and motivation Business questions to ask before taking a startup data role Probing the People aspects of an early-stage startup Question startup culture before accepting a data-to-AI role Plumbing, Decisions, and Automation: De-hyping Data \u0026 AI How to avoid startups with poor development processes Startup data health starts with healthy event tracking AI ain’t gonna save you from bad data You can download a guide containing all the questions as a PDF. Next, I’ll go into the questions from the Security \u0026 Compliance section. Feedback is always welcome!\n","wordCount":"1743","inLanguage":"en","image":"https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack.webp","datePublished":"2024-06-24T02:00:00Z","dateModified":"2024-06-24T14:12:50+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Is your tech stack ready for data-intensive applications?</h1><div class=post-meta><span title='2024-06-24 02:00:00 +0000 UTC'>June 24, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack.webp alt="a stack of computers, wires, and hay in an office area" width=1200 height=630></figure><div class=post-content><p>Data-intensive projects fail when you treat them like traditional software projects. But they also fail when you don&rsquo;t apply best practices from software engineering.</p><p>Why?</p><p>Because data-intensive systems are made of data, and also made of software. Therefore:</p><ol><li>data changes can lead to failures; and</li><li>software changes can lead to failures.</li></ol><p>In traditional software systems, you fully control the changes. Your software doesn&rsquo;t change unexpectedly.</p><p>In data-intensive systems, you cede control to the data. The data changes constantly, and it affects the behaviour of your system.</p><p>To succeed, you need to manage both the data and software aspects of your systems. This successful management is the essence of the questions from the Tech section of <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. This post presents the questions along with guidance on what constitutes healthy answers.</p><h2 id=what-do-i-mean-by-data-intensity>What do I mean by data intensity?<a hidden class=anchor aria-hidden=true href=#what-do-i-mean-by-data-intensity>#</a></h2><p>For the last few months, I have set my LinkedIn tagline to <em>&ldquo;helping startups ship data-intensive solutions (AI/ML for climate/nature tech)&rdquo;</em>. I landed on it after a bit of a struggle with succinctly defining exactly what it is I do.</p><p>The problem is that after over a decade of &ldquo;data&rdquo; roles, I don&rsquo;t see the field of AI/ML (artificial intelligence and machine learning) as a sanctified sphere that&rsquo;s separate from real-world data and humans. Further, while business intelligence (aka analytics) is seen by some as less &ldquo;sexy&rdquo; than AI/ML, I see it as a different lens of using data to drive business outcomes. Essentially, it all comes down to <a href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/>plumbing, decisions, and automation</a>.</p><p>In the days of the Big Data hype, much attention was given to the three Vs of data: Volume, Velocity, and Variety – what flows through the plumbing. To me, data intensity goes beyond the three Vs. This is how I define it in the first section of my Data-to-AI Health Check:</p><blockquote><p>High data intensity typically requires low-latency processing of large volumes of data with more than one database server. With high intensity, data processing issues noticeably affect key business metrics.</p></blockquote><p>That is, in data-intensive settings, data issues affect decisions and automation in a way that hurts the business.</p><p>A couple of examples may help:</p><ul><li>Low intensity: A dashboard that doesn&rsquo;t contain any actionable metrics. If the metrics change due to bugs in the data processing, it doesn&rsquo;t affect decisions.</li><li>High intensity: An ad-serving platform that personalises ads in real time based on numerous data points. If any model or system breaks, millions of dollars may be lost.</li></ul><p>In short, <strong>the higher the data intensity, the more the flow of data affects the bottom line</strong>.</p><h2 id=understanding-tech-stacks-and-lifecycles>Understanding tech stacks and lifecycles<a hidden class=anchor aria-hidden=true href=#understanding-tech-stacks-and-lifecycles>#</a></h2><p>At 15 questions, the Tech section of <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a> is long and deep. To keep this post digestible, I won&rsquo;t go into every question. Instead, I&rsquo;ve grouped the questions by theme.</p><p>First up, on the tech stacks and lifecycles:</p><blockquote><ul><li>Q1: Provide an architecture diagram for your tech systems (product and data stacks), including first-party and third-party tools and databases. If a diagram doesn&rsquo;t exist, an ad hoc drawing would work as well.</li><li>Q2: Zooming in on data stacks, what tools and pipelines do you use for the <a href=https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/>data engineering lifecycles (generation, storage, ingestion, transformation, and serving)</a>, and downstream uses (analytics, AI/ML, and reverse ETL)?</li><li>Q3: Zooming in further on the downstream uses of analytics and AI/ML, what systems, processes, and tools do you use to manage their lifecycles (discovery, data preparation, model engineering, deployment, monitoring, and maintenance)? Give specific project examples.</li><li>Q4: Are there any tech choices you regret? Why?</li><li>Q5: Are there any new tools you want to introduce to your stack? Why?</li></ul></blockquote><p>To an extent, tech stacks and lifecycles follow <a href=https://en.wikipedia.org/wiki/Anna_Karenina_principle target=_blank rel=noopener>the Anna Karenina principle</a>: <em>All healthy stacks are alike; each unhealthy stack is unhealthy in its own way.</em></p><p>By asking for their descriptions, I&rsquo;m aiming to uncover gaps and opportunities.</p><p>Often, some gaps are known to the people in charge, but they haven&rsquo;t been explicitly discussed. This is especially common in startups, where competing priorities and resource constraints require compromising on scope and quality to fuel growth. In addition, it&rsquo;s impossible for small startups to have all the relevant experts on the founding team, so best practices aren&rsquo;t followed due to ignorance rather than due to intentional compromises made to move fast. However, a lack of awareness of best practices can often lead to the startup moving too slowly.</p><p>Two concrete examples:</p><ul><li>many people outside the data world are unaware of recent advances in tooling for management of data transformations (<a href=https://www.getdbt.com/product/what-is-dbt target=_blank rel=noopener>dbt</a> and its competitors), and</li><li>practitioners who&rsquo;ve only built ML models in academia rarely appreciate the complexity of running ML in production (<a href=https://en.wikipedia.org/wiki/MLOps target=_blank rel=noopener>MLOps</a> is much more than ML).</li></ul><p>Beyond gaps that may be exposed by Q1-Q3, explicitly asking about regrettable and future tech choices (Q4 & Q5) helps surface evidence of an overreliance on unproven or exotic tech (aka <a href=https://boringtechnology.club/ target=_blank rel=noopener>wasted innovation tokens</a>) and an underreliance on proven tech (aka reinvention of wheels). This is especially common with <a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>inexperienced operators who are too excited about playing with shiny tools</a>. Use of unproven tech should be reserved to the cases where it confers a competitive advantage (e.g., being first to market with the latest AI advances).</p><h2 id=basic-quality-assurance-and-delivery>Basic quality assurance and delivery<a hidden class=anchor aria-hidden=true href=#basic-quality-assurance-and-delivery>#</a></h2><p>The next set of questions covers what I consider to be the basics of quality assurance and continuous delivery:</p><blockquote><ul><li>Q6: How do you test product code and infrastructure setup? How good is the coverage (formally – percentage of statements covered, and conceptually – confidence from 1 to 5 that tests capture faults prior to deployment)?</li><li>Q7: Do all tests run automatically on every version of the code?</li><li>Q8: Are deployments done as a single automated step (e.g., push new containers to production when the main branch is updated)?</li><li>Q9: How faithful are development, testing, and staging environments to the production setup? Are there gaps that can be feasibly addressed? If so, what is stopping you from addressing them?</li></ul></blockquote><p>As I&rsquo;m writing this in 2024, all the tooling exists to set things up with solid testing and deployment processes – and it&rsquo;s constantly getting easier. The only place where such processes may be skipped is in throwaway prototypes, where testing unnecessarily slows things down.</p><p>Being a startup is also not an excuse. <a href=https://martinfowler.com/articles/is-quality-worth-cost.html target=_blank rel=noopener>As Martin Fowler pointed out years ago</a>, the <em>internal</em> quality of software doesn&rsquo;t incur a cost. That is, by implementing solid systems and processes for automated testing and deployment, teams move faster. Teams that cut corners on internal quality may move faster in the <em>very</em> short term, but typically get overtaken by their higher-internal-quality counterparts within weeks.</p><p>No startup aims to be around only for a few weeks, so investing in internal quality is key to tech health.</p><p>In Fowler&rsquo;s words:</p><blockquote><ul><li>Neglecting internal quality leads to rapid build up of cruft</li><li>This cruft slows down feature development</li><li>Even a great team produces cruft, but by keeping internal quality high, is able to keep it under control</li><li>High internal quality keeps cruft to a minimum, allowing a team to add features with less effort, time, and cost</li></ul></blockquote><p>Unfortunately, some software engineers never learn this lesson. Further, data professionals that don&rsquo;t have a software background are even less likely to be exposed to the importance of internal quality and how it can be enforced.</p><p>That said, it&rsquo;s never too late to learn and improve. This is key to avoiding failure modes that arise in data projects when best practices from software engineering aren&rsquo;t applied.</p><h2 id=specific-data-intensive-failure-modes>Specific data-intensive failure modes<a hidden class=anchor aria-hidden=true href=#specific-data-intensive-failure-modes>#</a></h2><p>The next set of questions probes for failure modes that are specific to data-intensive work (data engineering, analytics, AI/ML, etc.):</p><blockquote><ul><li>Q10: Do you apply the same standards of testing and deploying product code to data? For example, is there untested SQL code hidden in dashboarding tools or the database layer, or is SQL treated like core product code (tracked in source control with isolated testing)?</li><li>Q11: How are schema changes managed and tested in each data system?</li><li>Q12: Do you rely on notebooks for production data code? If so, how do you ensure that notebook code meets the same quality standards as core product code (especially around testing and change management)?</li><li>Q13: Do advanced AI/ML projects meet your performance expectations? If not, do you know how to improve performance without data changes?</li></ul></blockquote><p>Data-intensive work is essentially about building models with software:</p><ul><li>Raw data is a model of real-world entities and events, expressed in database schemas (even &ldquo;schemaless&rdquo; databases have a schema – it&rsquo;s just unbounded).</li><li>Dashboards present models of metrics that originate in raw data, with the goal of informing decisions.</li><li>AI/ML models are essentially complex data transformations, e.g., from a matrix of pixels to a probability that the image modelled by the pixels is of a cat or a dog.</li></ul><p>Due to historical and practical reasons, much of this modelling work is done by people with no training in software engineering. While <a href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/>the industry is maturing</a>, Q10-13 often expose gaps. The ideal answer to each question is that all models are fully tested and managed – just like software, but with <a href=https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/>extra care for the complexity introduced by data</a>.</p><h2 id=maintaining-long-term-success>Maintaining long-term success<a hidden class=anchor aria-hidden=true href=#maintaining-long-term-success>#</a></h2><p>Finally, the last two questions cover monitoring and maintenance:</p><blockquote><ul><li>Q14: On a scale of 1 to 5, how confident are you in detecting and addressing issues in production (including product, infra, data, and <a href=https://twitter.com/mipsytipsy/status/1738048200630792245 target=_blank rel=noopener>ML observability 1.0 & 2.0</a>)? Do you have action plans to increase your level of confidence?</li><li>Q15: What DevOps, DataOps, and MLOps practices do you follow that weren&rsquo;t covered above? Are there known gaps and plans to address them?</li></ul></blockquote><p>Even if a data-intensive project is considered &ldquo;done&rdquo;, it still changes in production due to its dependence on data. The degree of likely change varies by project, but it needs to be actively managed for long-term success.</p><h2 id=data-to-ai-health-beyond-the-tech>Data-to-AI health beyond the tech<a hidden class=anchor aria-hidden=true href=#data-to-ai-health-beyond-the-tech>#</a></h2><p>This post is part of a series on <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. Previous posts:</p><ul><li><a href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/>Assessing a startup&rsquo;s data-to-AI health: Overview and motivation</a></li><li><a href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/>Business questions to ask before taking a startup data role</a></li><li><a href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/>Probing the People aspects of an early-stage startup</a></li><li><a href=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/>Question startup culture before accepting a data-to-AI role</a></li><li><a href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/>Plumbing, Decisions, and Automation: De-hyping Data & AI</a></li><li><a href=https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/>How to avoid startups with poor development processes</a></li><li><a href=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/>Startup data health starts with healthy event tracking</a></li><li><a href=https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/>AI ain&rsquo;t gonna save you from bad data</a></li></ul><p><a href=https://yanirseroussi.com/data-to-ai-health-check/>You can download a guide containing all the questions as a PDF</a>. Next, I&rsquo;ll go into the questions from the Security & Compliance section. Feedback is always welcome!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/analytics/>Analytics</a></li><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Is your tech stack ready for data-intensive applications? on x" href="https://x.com/intent/tweet/?text=Is%20your%20tech%20stack%20ready%20for%20data-intensive%20applications%3f&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f24%2fis-your-tech-stack-ready-for-data-intensive-applications%2f&amp;hashtags=analytics%2cartificialintelligence%2cdatascience%2cdatastrategy%2cmachinelearning%2csoftwareengineering%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is your tech stack ready for data-intensive applications? on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f24%2fis-your-tech-stack-ready-for-data-intensive-applications%2f&amp;title=Is%20your%20tech%20stack%20ready%20for%20data-intensive%20applications%3f&amp;summary=Is%20your%20tech%20stack%20ready%20for%20data-intensive%20applications%3f&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f24%2fis-your-tech-stack-ready-for-data-intensive-applications%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is your tech stack ready for data-intensive applications? on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f24%2fis-your-tech-stack-ready-for-data-intensive-applications%2f&title=Is%20your%20tech%20stack%20ready%20for%20data-intensive%20applications%3f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is your tech stack ready for data-intensive applications? on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f24%2fis-your-tech-stack-ready-for-data-intensive-applications%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is your tech stack ready for data-intensive applications? on whatsapp" href="https://api.whatsapp.com/send?text=Is%20your%20tech%20stack%20ready%20for%20data-intensive%20applications%3f%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f24%2fis-your-tech-stack-ready-for-data-intensive-applications%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is your tech stack ready for data-intensive applications? on telegram" href="https://telegram.me/share/url?text=Is%20your%20tech%20stack%20ready%20for%20data-intensive%20applications%3f&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f24%2fis-your-tech-stack-ready-for-data-intensive-applications%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is your tech stack ready for data-intensive applications? on ycombinator" href="https://news.ycombinator.com/submitlink?t=Is%20your%20tech%20stack%20ready%20for%20data-intensive%20applications%3f&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f24%2fis-your-tech-stack-ready-for-data-intensive-applications%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="analytics,artificial intelligence,data science,data strategy,machine learning,software engineering,startups"><meta name=description content="Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Is your tech stack ready for data-intensive applications?"><meta property="og:description" content="Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/"><meta property="og:image" content="https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack.webp"><meta property="article:section" content="posts"><meta property="article:published_time" content="2024-06-24T02:00:00+00:00"><meta property="article:modified_time" content="2024-06-24T14:12:50+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack.webp"><meta name=twitter:title content="Is your tech stack ready for data-intensive applications?"><meta name=twitter:description content="Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Is your tech stack ready for data-intensive applications?","item":"https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Is your tech stack ready for data-intensive applications?","name":"Is your tech stack ready for data-intensive applications?","description":"Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.","keywords":["analytics","artificial intelligence","data science","data strategy","machine learning","software engineering","startups"],"articleBody":"Data-intensive projects fail when you treat them like traditional software projects. But they also fail when you don’t apply best practices from software engineering.\nWhy?\nBecause data-intensive systems are made of data, and also made of software. Therefore:\ndata changes can lead to failures; and software changes can lead to failures. In traditional software systems, you fully control the changes. Your software doesn’t change unexpectedly.\nIn data-intensive systems, you cede control to the data. The data changes constantly, and it affects the behaviour of your system.\nTo succeed, you need to manage both the data and software aspects of your systems. This successful management is the essence of the questions from the Tech section of my Data-to-AI Health Check for Startups. This post presents the questions along with guidance on what constitutes healthy answers.\nWhat do I mean by data intensity? For the last few months, I have set my LinkedIn tagline to “helping startups ship data-intensive solutions (AI/ML for climate/nature tech)”. I landed on it after a bit of a struggle with succinctly defining exactly what it is I do.\nThe problem is that after over a decade of “data” roles, I don’t see the field of AI/ML (artificial intelligence and machine learning) as a sanctified sphere that’s separate from real-world data and humans. Further, while business intelligence (aka analytics) is seen by some as less “sexy” than AI/ML, I see it as a different lens of using data to drive business outcomes. Essentially, it all comes down to plumbing, decisions, and automation.\nIn the days of the Big Data hype, much attention was given to the three Vs of data: Volume, Velocity, and Variety – what flows through the plumbing. To me, data intensity goes beyond the three Vs. This is how I define it in the first section of my Data-to-AI Health Check:\nHigh data intensity typically requires low-latency processing of large volumes of data with more than one database server. With high intensity, data processing issues noticeably affect key business metrics.\nThat is, in data-intensive settings, data issues affect decisions and automation in a way that hurts the business.\nA couple of examples may help:\nLow intensity: A dashboard that doesn’t contain any actionable metrics. If the metrics change due to bugs in the data processing, it doesn’t affect decisions. High intensity: An ad-serving platform that personalises ads in real time based on numerous data points. If any model or system breaks, millions of dollars may be lost. In short, the higher the data intensity, the more the flow of data affects the bottom line.\nUnderstanding tech stacks and lifecycles At 15 questions, the Tech section of my Data-to-AI Health Check for Startups is long and deep. To keep this post digestible, I won’t go into every question. Instead, I’ve grouped the questions by theme.\nFirst up, on the tech stacks and lifecycles:\nQ1: Provide an architecture diagram for your tech systems (product and data stacks), including first-party and third-party tools and databases. If a diagram doesn’t exist, an ad hoc drawing would work as well. Q2: Zooming in on data stacks, what tools and pipelines do you use for the data engineering lifecycles (generation, storage, ingestion, transformation, and serving), and downstream uses (analytics, AI/ML, and reverse ETL)? Q3: Zooming in further on the downstream uses of analytics and AI/ML, what systems, processes, and tools do you use to manage their lifecycles (discovery, data preparation, model engineering, deployment, monitoring, and maintenance)? Give specific project examples. Q4: Are there any tech choices you regret? Why? Q5: Are there any new tools you want to introduce to your stack? Why? To an extent, tech stacks and lifecycles follow the Anna Karenina principle: All healthy stacks are alike; each unhealthy stack is unhealthy in its own way.\nBy asking for their descriptions, I’m aiming to uncover gaps and opportunities.\nOften, some gaps are known to the people in charge, but they haven’t been explicitly discussed. This is especially common in startups, where competing priorities and resource constraints require compromising on scope and quality to fuel growth. In addition, it’s impossible for small startups to have all the relevant experts on the founding team, so best practices aren’t followed due to ignorance rather than due to intentional compromises made to move fast. However, a lack of awareness of best practices can often lead to the startup moving too slowly.\nTwo concrete examples:\nmany people outside the data world are unaware of recent advances in tooling for management of data transformations (dbt and its competitors), and practitioners who’ve only built ML models in academia rarely appreciate the complexity of running ML in production (MLOps is much more than ML). Beyond gaps that may be exposed by Q1-Q3, explicitly asking about regrettable and future tech choices (Q4 \u0026 Q5) helps surface evidence of an overreliance on unproven or exotic tech (aka wasted innovation tokens) and an underreliance on proven tech (aka reinvention of wheels). This is especially common with inexperienced operators who are too excited about playing with shiny tools. Use of unproven tech should be reserved to the cases where it confers a competitive advantage (e.g., being first to market with the latest AI advances).\nBasic quality assurance and delivery The next set of questions covers what I consider to be the basics of quality assurance and continuous delivery:\nQ6: How do you test product code and infrastructure setup? How good is the coverage (formally – percentage of statements covered, and conceptually – confidence from 1 to 5 that tests capture faults prior to deployment)? Q7: Do all tests run automatically on every version of the code? Q8: Are deployments done as a single automated step (e.g., push new containers to production when the main branch is updated)? Q9: How faithful are development, testing, and staging environments to the production setup? Are there gaps that can be feasibly addressed? If so, what is stopping you from addressing them? As I’m writing this in 2024, all the tooling exists to set things up with solid testing and deployment processes – and it’s constantly getting easier. The only place where such processes may be skipped is in throwaway prototypes, where testing unnecessarily slows things down.\nBeing a startup is also not an excuse. As Martin Fowler pointed out years ago, the internal quality of software doesn’t incur a cost. That is, by implementing solid systems and processes for automated testing and deployment, teams move faster. Teams that cut corners on internal quality may move faster in the very short term, but typically get overtaken by their higher-internal-quality counterparts within weeks.\nNo startup aims to be around only for a few weeks, so investing in internal quality is key to tech health.\nIn Fowler’s words:\nNeglecting internal quality leads to rapid build up of cruft This cruft slows down feature development Even a great team produces cruft, but by keeping internal quality high, is able to keep it under control High internal quality keeps cruft to a minimum, allowing a team to add features with less effort, time, and cost Unfortunately, some software engineers never learn this lesson. Further, data professionals that don’t have a software background are even less likely to be exposed to the importance of internal quality and how it can be enforced.\nThat said, it’s never too late to learn and improve. This is key to avoiding failure modes that arise in data projects when best practices from software engineering aren’t applied.\nSpecific data-intensive failure modes The next set of questions probes for failure modes that are specific to data-intensive work (data engineering, analytics, AI/ML, etc.):\nQ10: Do you apply the same standards of testing and deploying product code to data? For example, is there untested SQL code hidden in dashboarding tools or the database layer, or is SQL treated like core product code (tracked in source control with isolated testing)? Q11: How are schema changes managed and tested in each data system? Q12: Do you rely on notebooks for production data code? If so, how do you ensure that notebook code meets the same quality standards as core product code (especially around testing and change management)? Q13: Do advanced AI/ML projects meet your performance expectations? If not, do you know how to improve performance without data changes? Data-intensive work is essentially about building models with software:\nRaw data is a model of real-world entities and events, expressed in database schemas (even “schemaless” databases have a schema – it’s just unbounded). Dashboards present models of metrics that originate in raw data, with the goal of informing decisions. AI/ML models are essentially complex data transformations, e.g., from a matrix of pixels to a probability that the image modelled by the pixels is of a cat or a dog. Due to historical and practical reasons, much of this modelling work is done by people with no training in software engineering. While the industry is maturing, Q10-13 often expose gaps. The ideal answer to each question is that all models are fully tested and managed – just like software, but with extra care for the complexity introduced by data.\nMaintaining long-term success Finally, the last two questions cover monitoring and maintenance:\nQ14: On a scale of 1 to 5, how confident are you in detecting and addressing issues in production (including product, infra, data, and ML observability 1.0 \u0026 2.0)? Do you have action plans to increase your level of confidence? Q15: What DevOps, DataOps, and MLOps practices do you follow that weren’t covered above? Are there known gaps and plans to address them? Even if a data-intensive project is considered “done”, it still changes in production due to its dependence on data. The degree of likely change varies by project, but it needs to be actively managed for long-term success.\nData-to-AI health beyond the tech This post is part of a series on my Data-to-AI Health Check for Startups. Previous posts:\nAssessing a startup’s data-to-AI health: Overview and motivation Business questions to ask before taking a startup data role Probing the People aspects of an early-stage startup Question startup culture before accepting a data-to-AI role Plumbing, Decisions, and Automation: De-hyping Data \u0026 AI How to avoid startups with poor development processes Startup data health starts with healthy event tracking AI ain’t gonna save you from bad data You can download a guide containing all the questions as a PDF. Next, I’ll go into the questions from the Security \u0026 Compliance section. Feedback is always welcome!\n","wordCount":"1743","inLanguage":"en","image":"https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack.webp","datePublished":"2024-06-24T02:00:00Z","dateModified":"2024-06-24T14:12:50+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Is your tech stack ready for data-intensive applications?</h1><div class=post-meta><span title='2024-06-24 02:00:00 +0000 UTC'>June 24, 2024</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack.webp alt="a stack of computers, wires, and hay in an office area" width=1200 height=630></figure><div class=post-content><p>Data-intensive projects fail when you treat them like traditional software projects. But they also fail when you don&rsquo;t apply best practices from software engineering.</p><p>Why?</p><p>Because data-intensive systems are made of data, and also made of software. Therefore:</p><ol><li>data changes can lead to failures; and</li><li>software changes can lead to failures.</li></ol><p>In traditional software systems, you fully control the changes. Your software doesn&rsquo;t change unexpectedly.</p><p>In data-intensive systems, you cede control to the data. The data changes constantly, and it affects the behaviour of your system.</p><p>To succeed, you need to manage both the data and software aspects of your systems. This successful management is the essence of the questions from the Tech section of <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. This post presents the questions along with guidance on what constitutes healthy answers.</p><h2 id=what-do-i-mean-by-data-intensity>What do I mean by data intensity?<a hidden class=anchor aria-hidden=true href=#what-do-i-mean-by-data-intensity>#</a></h2><p>For the last few months, I have set my LinkedIn tagline to <em>&ldquo;helping startups ship data-intensive solutions (AI/ML for climate/nature tech)&rdquo;</em>. I landed on it after a bit of a struggle with succinctly defining exactly what it is I do.</p><p>The problem is that after over a decade of &ldquo;data&rdquo; roles, I don&rsquo;t see the field of AI/ML (artificial intelligence and machine learning) as a sanctified sphere that&rsquo;s separate from real-world data and humans. Further, while business intelligence (aka analytics) is seen by some as less &ldquo;sexy&rdquo; than AI/ML, I see it as a different lens of using data to drive business outcomes. Essentially, it all comes down to <a href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/>plumbing, decisions, and automation</a>.</p><p>In the days of the Big Data hype, much attention was given to the three Vs of data: Volume, Velocity, and Variety – what flows through the plumbing. To me, data intensity goes beyond the three Vs. This is how I define it in the first section of my Data-to-AI Health Check:</p><blockquote><p>High data intensity typically requires low-latency processing of large volumes of data with more than one database server. With high intensity, data processing issues noticeably affect key business metrics.</p></blockquote><p>That is, in data-intensive settings, data issues affect decisions and automation in a way that hurts the business.</p><p>A couple of examples may help:</p><ul><li>Low intensity: A dashboard that doesn&rsquo;t contain any actionable metrics. If the metrics change due to bugs in the data processing, it doesn&rsquo;t affect decisions.</li><li>High intensity: An ad-serving platform that personalises ads in real time based on numerous data points. If any model or system breaks, millions of dollars may be lost.</li></ul><p>In short, <strong>the higher the data intensity, the more the flow of data affects the bottom line</strong>.</p><h2 id=understanding-tech-stacks-and-lifecycles>Understanding tech stacks and lifecycles<a hidden class=anchor aria-hidden=true href=#understanding-tech-stacks-and-lifecycles>#</a></h2><p>At 15 questions, the Tech section of <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a> is long and deep. To keep this post digestible, I won&rsquo;t go into every question. Instead, I&rsquo;ve grouped the questions by theme.</p><p>First up, on the tech stacks and lifecycles:</p><blockquote><ul><li>Q1: Provide an architecture diagram for your tech systems (product and data stacks), including first-party and third-party tools and databases. If a diagram doesn&rsquo;t exist, an ad hoc drawing would work as well.</li><li>Q2: Zooming in on data stacks, what tools and pipelines do you use for the <a href=https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/>data engineering lifecycles (generation, storage, ingestion, transformation, and serving)</a>, and downstream uses (analytics, AI/ML, and reverse ETL)?</li><li>Q3: Zooming in further on the downstream uses of analytics and AI/ML, what systems, processes, and tools do you use to manage their lifecycles (discovery, data preparation, model engineering, deployment, monitoring, and maintenance)? Give specific project examples.</li><li>Q4: Are there any tech choices you regret? Why?</li><li>Q5: Are there any new tools you want to introduce to your stack? Why?</li></ul></blockquote><p>To an extent, tech stacks and lifecycles follow <a href=https://en.wikipedia.org/wiki/Anna_Karenina_principle target=_blank rel=noopener>the Anna Karenina principle</a>: <em>All healthy stacks are alike; each unhealthy stack is unhealthy in its own way.</em></p><p>By asking for their descriptions, I&rsquo;m aiming to uncover gaps and opportunities.</p><p>Often, some gaps are known to the people in charge, but they haven&rsquo;t been explicitly discussed. This is especially common in startups, where competing priorities and resource constraints require compromising on scope and quality to fuel growth. In addition, it&rsquo;s impossible for small startups to have all the relevant experts on the founding team, so best practices aren&rsquo;t followed due to ignorance rather than due to intentional compromises made to move fast. However, a lack of awareness of best practices can often lead to the startup moving too slowly.</p><p>Two concrete examples:</p><ul><li>many people outside the data world are unaware of recent advances in tooling for management of data transformations (<a href=https://www.getdbt.com/product/what-is-dbt target=_blank rel=noopener>dbt</a> and its competitors), and</li><li>practitioners who&rsquo;ve only built ML models in academia rarely appreciate the complexity of running ML in production (<a href=https://en.wikipedia.org/wiki/MLOps target=_blank rel=noopener>MLOps</a> is much more than ML).</li></ul><p>Beyond gaps that may be exposed by Q1-Q3, explicitly asking about regrettable and future tech choices (Q4 & Q5) helps surface evidence of an overreliance on unproven or exotic tech (aka <a href=https://boringtechnology.club/ target=_blank rel=noopener>wasted innovation tokens</a>) and an underreliance on proven tech (aka reinvention of wheels). This is especially common with <a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>inexperienced operators who are too excited about playing with shiny tools</a>. Use of unproven tech should be reserved to the cases where it confers a competitive advantage (e.g., being first to market with the latest AI advances).</p><h2 id=basic-quality-assurance-and-delivery>Basic quality assurance and delivery<a hidden class=anchor aria-hidden=true href=#basic-quality-assurance-and-delivery>#</a></h2><p>The next set of questions covers what I consider to be the basics of quality assurance and continuous delivery:</p><blockquote><ul><li>Q6: How do you test product code and infrastructure setup? How good is the coverage (formally – percentage of statements covered, and conceptually – confidence from 1 to 5 that tests capture faults prior to deployment)?</li><li>Q7: Do all tests run automatically on every version of the code?</li><li>Q8: Are deployments done as a single automated step (e.g., push new containers to production when the main branch is updated)?</li><li>Q9: How faithful are development, testing, and staging environments to the production setup? Are there gaps that can be feasibly addressed? If so, what is stopping you from addressing them?</li></ul></blockquote><p>As I&rsquo;m writing this in 2024, all the tooling exists to set things up with solid testing and deployment processes – and it&rsquo;s constantly getting easier. The only place where such processes may be skipped is in throwaway prototypes, where testing unnecessarily slows things down.</p><p>Being a startup is also not an excuse. <a href=https://martinfowler.com/articles/is-quality-worth-cost.html target=_blank rel=noopener>As Martin Fowler pointed out years ago</a>, the <em>internal</em> quality of software doesn&rsquo;t incur a cost. That is, by implementing solid systems and processes for automated testing and deployment, teams move faster. Teams that cut corners on internal quality may move faster in the <em>very</em> short term, but typically get overtaken by their higher-internal-quality counterparts within weeks.</p><p>No startup aims to be around only for a few weeks, so investing in internal quality is key to tech health.</p><p>In Fowler&rsquo;s words:</p><blockquote><ul><li>Neglecting internal quality leads to rapid build up of cruft</li><li>This cruft slows down feature development</li><li>Even a great team produces cruft, but by keeping internal quality high, is able to keep it under control</li><li>High internal quality keeps cruft to a minimum, allowing a team to add features with less effort, time, and cost</li></ul></blockquote><p>Unfortunately, some software engineers never learn this lesson. Further, data professionals that don&rsquo;t have a software background are even less likely to be exposed to the importance of internal quality and how it can be enforced.</p><p>That said, it&rsquo;s never too late to learn and improve. This is key to avoiding failure modes that arise in data projects when best practices from software engineering aren&rsquo;t applied.</p><h2 id=specific-data-intensive-failure-modes>Specific data-intensive failure modes<a hidden class=anchor aria-hidden=true href=#specific-data-intensive-failure-modes>#</a></h2><p>The next set of questions probes for failure modes that are specific to data-intensive work (data engineering, analytics, AI/ML, etc.):</p><blockquote><ul><li>Q10: Do you apply the same standards of testing and deploying product code to data? For example, is there untested SQL code hidden in dashboarding tools or the database layer, or is SQL treated like core product code (tracked in source control with isolated testing)?</li><li>Q11: How are schema changes managed and tested in each data system?</li><li>Q12: Do you rely on notebooks for production data code? If so, how do you ensure that notebook code meets the same quality standards as core product code (especially around testing and change management)?</li><li>Q13: Do advanced AI/ML projects meet your performance expectations? If not, do you know how to improve performance without data changes?</li></ul></blockquote><p>Data-intensive work is essentially about building models with software:</p><ul><li>Raw data is a model of real-world entities and events, expressed in database schemas (even &ldquo;schemaless&rdquo; databases have a schema – it&rsquo;s just unbounded).</li><li>Dashboards present models of metrics that originate in raw data, with the goal of informing decisions.</li><li>AI/ML models are essentially complex data transformations, e.g., from a matrix of pixels to a probability that the image modelled by the pixels is of a cat or a dog.</li></ul><p>Due to historical and practical reasons, much of this modelling work is done by people with no training in software engineering. While <a href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/>the industry is maturing</a>, Q10-13 often expose gaps. The ideal answer to each question is that all models are fully tested and managed – just like software, but with <a href=https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/>extra care for the complexity introduced by data</a>.</p><h2 id=maintaining-long-term-success>Maintaining long-term success<a hidden class=anchor aria-hidden=true href=#maintaining-long-term-success>#</a></h2><p>Finally, the last two questions cover monitoring and maintenance:</p><blockquote><ul><li>Q14: On a scale of 1 to 5, how confident are you in detecting and addressing issues in production (including product, infra, data, and <a href=https://twitter.com/mipsytipsy/status/1738048200630792245 target=_blank rel=noopener>ML observability 1.0 & 2.0</a>)? Do you have action plans to increase your level of confidence?</li><li>Q15: What DevOps, DataOps, and MLOps practices do you follow that weren&rsquo;t covered above? Are there known gaps and plans to address them?</li></ul></blockquote><p>Even if a data-intensive project is considered &ldquo;done&rdquo;, it still changes in production due to its dependence on data. The degree of likely change varies by project, but it needs to be actively managed for long-term success.</p><h2 id=data-to-ai-health-beyond-the-tech>Data-to-AI health beyond the tech<a hidden class=anchor aria-hidden=true href=#data-to-ai-health-beyond-the-tech>#</a></h2><p>This post is part of a series on <a href=https://yanirseroussi.com/data-to-ai-health-check/>my Data-to-AI Health Check for Startups</a>. Previous posts:</p><ul><li><a href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/>Assessing a startup&rsquo;s data-to-AI health: Overview and motivation</a></li><li><a href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/>Business questions to ask before taking a startup data role</a></li><li><a href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/>Probing the People aspects of an early-stage startup</a></li><li><a href=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/>Question startup culture before accepting a data-to-AI role</a></li><li><a href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/>Plumbing, Decisions, and Automation: De-hyping Data & AI</a></li><li><a href=https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/>How to avoid startups with poor development processes</a></li><li><a href=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/>Startup data health starts with healthy event tracking</a></li><li><a href=https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/>AI ain&rsquo;t gonna save you from bad data</a></li></ul><p><a href=https://yanirseroussi.com/data-to-ai-health-check/>You can download a guide containing all the questions as a PDF</a>. Next, I&rsquo;ll go into the questions from the Security & Compliance section. Feedback is always welcome!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/analytics/>Analytics</a></li><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Is your tech stack ready for data-intensive applications? on x" href="https://x.com/intent/tweet/?text=Is%20your%20tech%20stack%20ready%20for%20data-intensive%20applications%3f&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f24%2fis-your-tech-stack-ready-for-data-intensive-applications%2f&amp;hashtags=analytics%2cartificialintelligence%2cdatascience%2cdatastrategy%2cmachinelearning%2csoftwareengineering%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is your tech stack ready for data-intensive applications? on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f24%2fis-your-tech-stack-ready-for-data-intensive-applications%2f&amp;title=Is%20your%20tech%20stack%20ready%20for%20data-intensive%20applications%3f&amp;summary=Is%20your%20tech%20stack%20ready%20for%20data-intensive%20applications%3f&amp;source=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f24%2fis-your-tech-stack-ready-for-data-intensive-applications%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is your tech stack ready for data-intensive applications? on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f24%2fis-your-tech-stack-ready-for-data-intensive-applications%2f&title=Is%20your%20tech%20stack%20ready%20for%20data-intensive%20applications%3f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is your tech stack ready for data-intensive applications? on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f24%2fis-your-tech-stack-ready-for-data-intensive-applications%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is your tech stack ready for data-intensive applications? on whatsapp" href="https://api.whatsapp.com/send?text=Is%20your%20tech%20stack%20ready%20for%20data-intensive%20applications%3f%20-%20https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f24%2fis-your-tech-stack-ready-for-data-intensive-applications%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is your tech stack ready for data-intensive applications? on telegram" href="https://telegram.me/share/url?text=Is%20your%20tech%20stack%20ready%20for%20data-intensive%20applications%3f&amp;url=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f24%2fis-your-tech-stack-ready-for-data-intensive-applications%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Is your tech stack ready for data-intensive applications? on ycombinator" href="https://news.ycombinator.com/submitlink?t=Is%20your%20tech%20stack%20ready%20for%20data-intensive%20applications%3f&u=https%3a%2f%2fyanirseroussi.com%2f2024%2f06%2f24%2fis-your-tech-stack-ready-for-data-intensive-applications%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/404.html b/404.html
index ebee20e95..80b7ab229 100644
--- a/404.html
+++ b/404.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>404 Page not found | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/404.html><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/404.html><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="404 Page not found"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/404.html"><meta name=twitter:card content="summary"><meta name=twitter:title content="404 Page not found"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/404.html><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/404.html><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="404 Page not found"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/404.html"><meta name=twitter:card content="summary"><meta name=twitter:title content="404 Page not found"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><a href=/ class=not-found>Page not found<br>🐳🏠</a></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/about/index.html b/about/index.html
index 1c7cd7d0e..111cd95a7 100644
--- a/about/index.html
+++ b/about/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>About Yanir: Startup Data & AI Consultant | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="About Yanir Seroussi, a hands-on data tech lead with over a decade of experience. Yanir helps climate/nature tech startups ship data-intensive solutions."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/about/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/about/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="About Yanir: Startup Data & AI Consultant"><meta property="og:description" content="About Yanir Seroussi, a hands-on data tech lead with over a decade of experience. Yanir helps climate/nature tech startups ship data-intensive solutions."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/about/"><meta property="og:image" content="https://yanirseroussi.com/about/profile.jpg"><meta property="article:section" content><meta property="article:modified_time" content="2024-06-26T14:12:50+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/about/profile.jpg"><meta name=twitter:title content="About Yanir: Startup Data & AI Consultant"><meta name=twitter:description content="About Yanir Seroussi, a hands-on data tech lead with over a decade of experience. Yanir helps climate/nature tech startups ship data-intensive solutions."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"About Yanir: Startup Data \u0026 AI Consultant","item":"https://yanirseroussi.com/about/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"About Yanir: Startup Data \u0026 AI Consultant","name":"About Yanir: Startup Data \u0026 AI Consultant","description":"About Yanir Seroussi, a hands-on data tech lead with over a decade of experience. Yanir helps climate/nature tech startups ship data-intensive solutions.","keywords":[],"articleBody":"With over a decade of experience across various data and engineering roles, the main theme of my career has been bringing data-intensive applications to production. This has included anything from solving isolated data problems to building systems that serve millions of users. With a proven capability to work independently and in teams, lead and mentor colleagues, and communicate with both technical and non-technical stakeholders, my focus is always on delivering business value.\nMy experience and formal education fall under three key areas:\nsoftware engineering (15+ years; Computer Science BSc) data science / engineering (10+ years; Artificial Intelligence PhD) tech leadership (5+ years with startups and scaleups) 🐳 My current mission is to help climate and nature tech startups grow and scale successfully with Data/AI/ML. See my consulting page for details, or head directly to my contact page if you have a problem you want to discuss.\nTestimonials Here are some key quotes from people I’ve worked with, including leaders at big tech companies, startups, and scaleups. See the Recommendations section of my LinkedIn profile for attribution and further details.\nYanir is that rare data/AI practitioner who combines deep experience in data science with breadth in software engineering and experience working with marketing and product teams. This vastly increases the success rate of data/AI projects that he’s involved with.\nYanir has the rare qualities of being (a) very good with data and (b) very able to express his opinions – especially when he doesn’t agree with how something is being done. His unwavering focus on the quality and efficiency of the development process has transformed the platform and team itself.\nYanir is an exceptional data scientist with the rare combination of data science expertise and strong software engineering principles. He is a strategic thinker and problem solver, adept at identifying root causes and implementing engineering solutions.\nI’d recommend Yanir any day if you are looking for someone who can execute thoughtfully and leave no stone unturned.\nYanir’s proactivity, creativity and high quality of work have been invaluable to our organisation. I strongly recommend Yanir to anyone looking to implement data \u0026 AI projects.\nIn addition to taking (and excelling at) ownership of the entire data segment of the company, he also accepted key responsibilities within the vision setting, goal setting, and fund raising parts of the business.\nHe is the single most talented person I’ve had the opportunity to work with in my nearly 15 years in Data/ML.\nYanir is a fantastic engineer.\nYanir practically s**** beautiful code.\nPast work examples Let’s go deeper with a few highlights from my work:\nBuilding production systems that serve millions of users. In my work with Automattic, I re-architected and led the implementation of the company’s unified online experimentation platform, and co-led the implementation of machine learning pipelines that had a significant impact on revenue from marketing campaigns. Solving isolated data \u0026 machine learning problems. I’m a retired Kaggle competition master, having ranked in the top ten of the five competitions I participated in. I’ve also worked on various other problems throughout my career, but many of them haven’t resulted in public artefacts – such is the nature of commercial data. Software engineering and programming expertise. My undergraduate degree was in computer science, with a focus on software engineering. I graduated first in class from the Technion – a top Israeli university. My early career included software engineering work with big tech companies (Intel, Qualcomm, and Google). I chose to work with startups after my PhD, before joining a unicorn scaleup (Automattic), and then returning to the startup and freelancing worlds. All my roles included a substantial hands-on coding component. I take software engineering seriously and strive to keep on top of and apply best practices, as solid software is the foundation of all data work. Artificial intelligence and data science expertise. In my PhD I formally specialised in artificial intelligence. As the term artificial intelligence was falling out of favour when I submitted my thesis in 2012, I also say that my PhD is in data science (which was decreasing in popularity a decade later). Either way, it resulted in some publications in top venues and won an award for the best thesis in my faculty at Monash – a leading Australian university. Personally, I don’t think it’s a big deal, but people seem to love PhDs and other credentials! Since my PhD, I’ve continued learning and honing my skills, as reflected by posts on this site and my LinkedIn profile. Outcomes beat job titles One of the downsides of working in an ever-changing field and accumulating a broad range of experiences is that it’s hard to summarise with a concise title. For example, being a data scientist used to imply having strong software engineering skills, but this has changed over time. It’s a similar story with the decline and rise of artificial intelligence. In an ideal world, I’d be able to let my work speak for itself. In our world, people search for keywords and have different understandings of concepts, e.g., they may want “an AI solution” to a problem that can be solved with deterministic software engineering. Or they may believe they need an AI Engineer rather than a Data Scientist, when a few years ago it’d have been the opposite (as I’m writing this in 2023, you could replace the words data science with artificial intelligence across my historical posts and much of what I wrote would still hold).\nAnyway, whether you’re trying to navigate Data \u0026 AI terminology or solve specific problems, I can probably help. My aim is to get to the root of business problems and iteratively implement pragmatic solutions. The taxonomy of Data \u0026 AI professionals is only relevant if I’m helping you hire a team.\nA subset of roles I’ve performed in one way or another.\nSource: Machine Learning Operations (MLOps): Overview, Definition, and Architecture. ","wordCount":"980","inLanguage":"en","image":"https://yanirseroussi.com/about/profile.jpg","datePublished":"0001-01-01T00:00:00Z","dateModified":"2024-06-26T14:12:50+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/about/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span class=active>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">About Yanir: Startup Data & AI Consultant</h1><div class=post-meta></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/about/profile_huff5fdd8a3d7ddcafd6d7832f9991a5bf_676979_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/about/profile_huff5fdd8a3d7ddcafd6d7832f9991a5bf_676979_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/about/profile_huff5fdd8a3d7ddcafd6d7832f9991a5bf_676979_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/about/profile_huff5fdd8a3d7ddcafd6d7832f9991a5bf_676979_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/about/profile_huff5fdd8a3d7ddcafd6d7832f9991a5bf_676979_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/about/profile.jpg 2692w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/about/profile.jpg alt="Yanir Seroussi's profile picture" width=2692 height=2420></figure><div class=post-content><p>With over a decade of experience across various data and engineering roles, the main theme of my career has been bringing data-intensive applications to production. This has included anything from solving isolated data problems to building systems that serve millions of users. With a proven capability to work independently and in teams, lead and mentor colleagues, and communicate with both technical and non-technical stakeholders, my focus is always on delivering business value.</p><p>My experience and formal education fall under three key areas:</p><ul><li>software engineering (15+ years; Computer Science BSc)</li><li>data science / engineering (10+ years; Artificial Intelligence PhD)</li><li>tech leadership (5+ years with startups and scaleups)</li></ul><p><strong>🐳 My current mission is to help climate and nature tech startups grow and scale successfully with Data/AI/ML.</strong> See <a href=/consult/>my consulting page</a> for details, or head directly to <a href=/contact/>my contact page</a> if you have a problem you want to discuss.</p><h2 id=testimonials>Testimonials<a hidden class=anchor aria-hidden=true href=#testimonials>#</a></h2><p>Here are some key quotes from people I&rsquo;ve worked with, including leaders at big tech companies, startups, and scaleups. See <a href=https://www.linkedin.com/in/yanirseroussi/details/recommendations/ target=_blank rel=noopener>the Recommendations section of my LinkedIn profile</a> for attribution and further details.</p><blockquote><p>Yanir is that rare data/AI practitioner who combines deep experience in data science with breadth in software engineering and experience working with marketing and product teams. This vastly increases the success rate of data/AI projects that he&rsquo;s involved with.</p></blockquote><blockquote><p>Yanir has the rare qualities of being (a) very good with data and (b) very able to express his opinions &ndash; especially when he doesn&rsquo;t agree with how something is being done. His unwavering focus on the quality and efficiency of the development process has transformed the platform and team itself.</p></blockquote><blockquote><p>Yanir is an exceptional data scientist with the rare combination of data science expertise and strong software engineering principles. He is a strategic thinker and problem solver, adept at identifying root causes and implementing engineering solutions.</p></blockquote><blockquote><p>I&rsquo;d recommend Yanir any day if you are looking for someone who can execute thoughtfully and leave no stone unturned.</p></blockquote><blockquote><p>Yanir&rsquo;s proactivity, creativity and high quality of work have been invaluable to our organisation. I strongly recommend Yanir to anyone looking to implement data & AI projects.</p></blockquote><blockquote><p>In addition to taking (and excelling at) ownership of the entire data segment of the company, he also accepted key responsibilities within the vision setting, goal setting, and fund raising parts of the business.</p></blockquote><blockquote><p>He is the single most talented person I&rsquo;ve had the opportunity to work with in my nearly 15 years in Data/ML.</p></blockquote><blockquote><p>Yanir is a fantastic engineer.</p></blockquote><blockquote><p>Yanir practically s**** beautiful code.</p></blockquote><h2 id=past-work-examples>Past work examples<a hidden class=anchor aria-hidden=true href=#past-work-examples>#</a></h2><p>Let&rsquo;s go deeper with a few highlights from my work:</p><ul><li><strong>Building production systems that serve millions of users.</strong> In <a href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/>my work with Automattic</a>, I re-architected and led the implementation of the company&rsquo;s unified online experimentation platform, and co-led the implementation of machine learning pipelines that had a significant impact on revenue from marketing campaigns.</li><li><strong>Solving isolated data & machine learning problems.</strong> I&rsquo;m <a href=https://www.kaggle.com/yanirseroussi target=_blank rel=noopener>a retired Kaggle competition master</a>, having <a href=https://yanirseroussi.com/kaggle/>ranked in the top ten of the five competitions I participated in</a>. I&rsquo;ve also worked on various other problems throughout my career, but many of them haven&rsquo;t resulted in public artefacts – such is the nature of commercial data.</li><li><strong>Software engineering and programming expertise.</strong> My undergraduate degree was in computer science, with a focus on software engineering. I graduated first in class from <a href=https://en.wikipedia.org/wiki/Technion_%E2%80%93_Israel_Institute_of_Technology target=_blank rel=noopener>the Technion – a top Israeli university</a>. My early career included software engineering work with big tech companies (Intel, Qualcomm, and Google). I chose to work with startups after my PhD, before joining a unicorn scaleup (Automattic), and then <a href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/>returning to the startup and freelancing worlds</a>. All my roles included a substantial hands-on coding component. I take software engineering seriously and strive to keep on top of and apply best practices, as <a href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/>solid software is the foundation of all data work</a>.</li><li><strong>Artificial intelligence and data science expertise.</strong> In my PhD I formally specialised in artificial intelligence. As the term <em>artificial intelligence</em> was falling out of favour when I submitted my thesis in 2012, I also say that my PhD is in <em>data science</em> (which was decreasing in popularity a decade later). Either way, it resulted in <a href=https://yanirseroussi.com/phd-work/>some publications in top venues</a> and <a href=https://www.monash.edu/news/articles/top-of-the-class target=_blank rel=noopener>won an award for the best thesis in my faculty</a> at <a href=https://en.wikipedia.org/wiki/Monash_University target=_blank rel=noopener>Monash – a leading Australian university</a>. Personally, I don&rsquo;t think it&rsquo;s a big deal, but people seem to love PhDs and other credentials! Since my PhD, I&rsquo;ve continued learning and honing my skills, as reflected by <a href=https://yanirseroussi.com/posts/>posts on this site</a> and <a href=https://www.linkedin.com/in/yanirseroussi/ target=_blank rel=noopener>my LinkedIn profile</a>.</li></ul><h2 id=outcomes-beat-job-titles>Outcomes beat job titles<a hidden class=anchor aria-hidden=true href=#outcomes-beat-job-titles>#</a></h2><p>One of the downsides of working in an ever-changing field and accumulating a broad range of experiences is that it&rsquo;s hard to summarise with a concise title. For example, being a data scientist <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>used to imply having strong software engineering skills</a>, but <a href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/>this has changed over time</a>. It&rsquo;s a similar story with the decline and rise of artificial intelligence. In an ideal world, I&rsquo;d be able to let my work speak for itself. In our world, people search for keywords and have different understandings of concepts, e.g., they may want &ldquo;an AI solution&rdquo; to a problem that can be solved with deterministic software engineering. Or they may believe they need an AI Engineer rather than a Data Scientist, when a few years ago it&rsquo;d have been the opposite (as I&rsquo;m writing this in 2023, you could replace the words <em>data science</em> with <em>artificial intelligence</em> across my historical posts and much of what I wrote would still hold).</p><p>Anyway, whether you&rsquo;re trying to navigate Data & AI terminology or solve specific problems, I can probably help. My aim is to get to the root of business problems and iteratively implement pragmatic solutions. The taxonomy of Data & AI professionals is only relevant if I&rsquo;m helping you hire a team.</p><figure><a href=mlops-roles.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
+<meta name=keywords content><meta name=description content="About Yanir Seroussi, a hands-on data tech lead with over a decade of experience. Yanir helps climate/nature tech startups ship data-intensive solutions."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/about/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/about/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="About Yanir: Startup Data & AI Consultant"><meta property="og:description" content="About Yanir Seroussi, a hands-on data tech lead with over a decade of experience. Yanir helps climate/nature tech startups ship data-intensive solutions."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/about/"><meta property="og:image" content="https://yanirseroussi.com/about/profile.jpg"><meta property="article:section" content><meta property="article:modified_time" content="2024-06-26T14:12:50+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/about/profile.jpg"><meta name=twitter:title content="About Yanir: Startup Data & AI Consultant"><meta name=twitter:description content="About Yanir Seroussi, a hands-on data tech lead with over a decade of experience. Yanir helps climate/nature tech startups ship data-intensive solutions."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"About Yanir: Startup Data \u0026 AI Consultant","item":"https://yanirseroussi.com/about/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"About Yanir: Startup Data \u0026 AI Consultant","name":"About Yanir: Startup Data \u0026 AI Consultant","description":"About Yanir Seroussi, a hands-on data tech lead with over a decade of experience. Yanir helps climate/nature tech startups ship data-intensive solutions.","keywords":[],"articleBody":"With over a decade of experience across various data and engineering roles, the main theme of my career has been bringing data-intensive applications to production. This has included anything from solving isolated data problems to building systems that serve millions of users. With a proven capability to work independently and in teams, lead and mentor colleagues, and communicate with both technical and non-technical stakeholders, my focus is always on delivering business value.\nMy experience and formal education fall under three key areas:\nsoftware engineering (15+ years; Computer Science BSc) data science / engineering (10+ years; Artificial Intelligence PhD) tech leadership (5+ years with startups and scaleups) 🐳 My current mission is to help climate and nature tech startups grow and scale successfully with Data/AI/ML. See my consulting page for details, or head directly to my contact page if you have a problem you want to discuss.\nTestimonials Here are some key quotes from people I’ve worked with, including leaders at big tech companies, startups, and scaleups. See the Recommendations section of my LinkedIn profile for attribution and further details.\nYanir is that rare data/AI practitioner who combines deep experience in data science with breadth in software engineering and experience working with marketing and product teams. This vastly increases the success rate of data/AI projects that he’s involved with.\nYanir has the rare qualities of being (a) very good with data and (b) very able to express his opinions – especially when he doesn’t agree with how something is being done. His unwavering focus on the quality and efficiency of the development process has transformed the platform and team itself.\nYanir is an exceptional data scientist with the rare combination of data science expertise and strong software engineering principles. He is a strategic thinker and problem solver, adept at identifying root causes and implementing engineering solutions.\nI’d recommend Yanir any day if you are looking for someone who can execute thoughtfully and leave no stone unturned.\nYanir’s proactivity, creativity and high quality of work have been invaluable to our organisation. I strongly recommend Yanir to anyone looking to implement data \u0026 AI projects.\nIn addition to taking (and excelling at) ownership of the entire data segment of the company, he also accepted key responsibilities within the vision setting, goal setting, and fund raising parts of the business.\nHe is the single most talented person I’ve had the opportunity to work with in my nearly 15 years in Data/ML.\nYanir is a fantastic engineer.\nYanir practically s**** beautiful code.\nPast work examples Let’s go deeper with a few highlights from my work:\nBuilding production systems that serve millions of users. In my work with Automattic, I re-architected and led the implementation of the company’s unified online experimentation platform, and co-led the implementation of machine learning pipelines that had a significant impact on revenue from marketing campaigns. Solving isolated data \u0026 machine learning problems. I’m a retired Kaggle competition master, having ranked in the top ten of the five competitions I participated in. I’ve also worked on various other problems throughout my career, but many of them haven’t resulted in public artefacts – such is the nature of commercial data. Software engineering and programming expertise. My undergraduate degree was in computer science, with a focus on software engineering. I graduated first in class from the Technion – a top Israeli university. My early career included software engineering work with big tech companies (Intel, Qualcomm, and Google). I chose to work with startups after my PhD, before joining a unicorn scaleup (Automattic), and then returning to the startup and freelancing worlds. All my roles included a substantial hands-on coding component. I take software engineering seriously and strive to keep on top of and apply best practices, as solid software is the foundation of all data work. Artificial intelligence and data science expertise. In my PhD I formally specialised in artificial intelligence. As the term artificial intelligence was falling out of favour when I submitted my thesis in 2012, I also say that my PhD is in data science (which was decreasing in popularity a decade later). Either way, it resulted in some publications in top venues and won an award for the best thesis in my faculty at Monash – a leading Australian university. Personally, I don’t think it’s a big deal, but people seem to love PhDs and other credentials! Since my PhD, I’ve continued learning and honing my skills, as reflected by posts on this site and my LinkedIn profile. Outcomes beat job titles One of the downsides of working in an ever-changing field and accumulating a broad range of experiences is that it’s hard to summarise with a concise title. For example, being a data scientist used to imply having strong software engineering skills, but this has changed over time. It’s a similar story with the decline and rise of artificial intelligence. In an ideal world, I’d be able to let my work speak for itself. In our world, people search for keywords and have different understandings of concepts, e.g., they may want “an AI solution” to a problem that can be solved with deterministic software engineering. Or they may believe they need an AI Engineer rather than a Data Scientist, when a few years ago it’d have been the opposite (as I’m writing this in 2023, you could replace the words data science with artificial intelligence across my historical posts and much of what I wrote would still hold).\nAnyway, whether you’re trying to navigate Data \u0026 AI terminology or solve specific problems, I can probably help. My aim is to get to the root of business problems and iteratively implement pragmatic solutions. The taxonomy of Data \u0026 AI professionals is only relevant if I’m helping you hire a team.\nA subset of roles I’ve performed in one way or another.\nSource: Machine Learning Operations (MLOps): Overview, Definition, and Architecture. ","wordCount":"980","inLanguage":"en","image":"https://yanirseroussi.com/about/profile.jpg","datePublished":"0001-01-01T00:00:00Z","dateModified":"2024-06-26T14:12:50+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/about/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span class=active>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">About Yanir: Startup Data & AI Consultant</h1><div class=post-meta></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/about/profile_huff5fdd8a3d7ddcafd6d7832f9991a5bf_676979_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/about/profile_huff5fdd8a3d7ddcafd6d7832f9991a5bf_676979_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/about/profile_huff5fdd8a3d7ddcafd6d7832f9991a5bf_676979_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/about/profile_huff5fdd8a3d7ddcafd6d7832f9991a5bf_676979_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/about/profile_huff5fdd8a3d7ddcafd6d7832f9991a5bf_676979_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/about/profile.jpg 2692w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/about/profile.jpg alt="Yanir Seroussi's profile picture" width=2692 height=2420></figure><div class=post-content><p>With over a decade of experience across various data and engineering roles, the main theme of my career has been bringing data-intensive applications to production. This has included anything from solving isolated data problems to building systems that serve millions of users. With a proven capability to work independently and in teams, lead and mentor colleagues, and communicate with both technical and non-technical stakeholders, my focus is always on delivering business value.</p><p>My experience and formal education fall under three key areas:</p><ul><li>software engineering (15+ years; Computer Science BSc)</li><li>data science / engineering (10+ years; Artificial Intelligence PhD)</li><li>tech leadership (5+ years with startups and scaleups)</li></ul><p><strong>🐳 My current mission is to help climate and nature tech startups grow and scale successfully with Data/AI/ML.</strong> See <a href=/consult/>my consulting page</a> for details, or head directly to <a href=/contact/>my contact page</a> if you have a problem you want to discuss.</p><h2 id=testimonials>Testimonials<a hidden class=anchor aria-hidden=true href=#testimonials>#</a></h2><p>Here are some key quotes from people I&rsquo;ve worked with, including leaders at big tech companies, startups, and scaleups. See <a href=https://www.linkedin.com/in/yanirseroussi/details/recommendations/ target=_blank rel=noopener>the Recommendations section of my LinkedIn profile</a> for attribution and further details.</p><blockquote><p>Yanir is that rare data/AI practitioner who combines deep experience in data science with breadth in software engineering and experience working with marketing and product teams. This vastly increases the success rate of data/AI projects that he&rsquo;s involved with.</p></blockquote><blockquote><p>Yanir has the rare qualities of being (a) very good with data and (b) very able to express his opinions &ndash; especially when he doesn&rsquo;t agree with how something is being done. His unwavering focus on the quality and efficiency of the development process has transformed the platform and team itself.</p></blockquote><blockquote><p>Yanir is an exceptional data scientist with the rare combination of data science expertise and strong software engineering principles. He is a strategic thinker and problem solver, adept at identifying root causes and implementing engineering solutions.</p></blockquote><blockquote><p>I&rsquo;d recommend Yanir any day if you are looking for someone who can execute thoughtfully and leave no stone unturned.</p></blockquote><blockquote><p>Yanir&rsquo;s proactivity, creativity and high quality of work have been invaluable to our organisation. I strongly recommend Yanir to anyone looking to implement data & AI projects.</p></blockquote><blockquote><p>In addition to taking (and excelling at) ownership of the entire data segment of the company, he also accepted key responsibilities within the vision setting, goal setting, and fund raising parts of the business.</p></blockquote><blockquote><p>He is the single most talented person I&rsquo;ve had the opportunity to work with in my nearly 15 years in Data/ML.</p></blockquote><blockquote><p>Yanir is a fantastic engineer.</p></blockquote><blockquote><p>Yanir practically s**** beautiful code.</p></blockquote><h2 id=past-work-examples>Past work examples<a hidden class=anchor aria-hidden=true href=#past-work-examples>#</a></h2><p>Let&rsquo;s go deeper with a few highlights from my work:</p><ul><li><strong>Building production systems that serve millions of users.</strong> In <a href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/>my work with Automattic</a>, I re-architected and led the implementation of the company&rsquo;s unified online experimentation platform, and co-led the implementation of machine learning pipelines that had a significant impact on revenue from marketing campaigns.</li><li><strong>Solving isolated data & machine learning problems.</strong> I&rsquo;m <a href=https://www.kaggle.com/yanirseroussi target=_blank rel=noopener>a retired Kaggle competition master</a>, having <a href=https://yanirseroussi.com/kaggle/>ranked in the top ten of the five competitions I participated in</a>. I&rsquo;ve also worked on various other problems throughout my career, but many of them haven&rsquo;t resulted in public artefacts – such is the nature of commercial data.</li><li><strong>Software engineering and programming expertise.</strong> My undergraduate degree was in computer science, with a focus on software engineering. I graduated first in class from <a href=https://en.wikipedia.org/wiki/Technion_%E2%80%93_Israel_Institute_of_Technology target=_blank rel=noopener>the Technion – a top Israeli university</a>. My early career included software engineering work with big tech companies (Intel, Qualcomm, and Google). I chose to work with startups after my PhD, before joining a unicorn scaleup (Automattic), and then <a href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/>returning to the startup and freelancing worlds</a>. All my roles included a substantial hands-on coding component. I take software engineering seriously and strive to keep on top of and apply best practices, as <a href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/>solid software is the foundation of all data work</a>.</li><li><strong>Artificial intelligence and data science expertise.</strong> In my PhD I formally specialised in artificial intelligence. As the term <em>artificial intelligence</em> was falling out of favour when I submitted my thesis in 2012, I also say that my PhD is in <em>data science</em> (which was decreasing in popularity a decade later). Either way, it resulted in <a href=https://yanirseroussi.com/phd-work/>some publications in top venues</a> and <a href=https://www.monash.edu/news/articles/top-of-the-class target=_blank rel=noopener>won an award for the best thesis in my faculty</a> at <a href=https://en.wikipedia.org/wiki/Monash_University target=_blank rel=noopener>Monash – a leading Australian university</a>. Personally, I don&rsquo;t think it&rsquo;s a big deal, but people seem to love PhDs and other credentials! Since my PhD, I&rsquo;ve continued learning and honing my skills, as reflected by <a href=https://yanirseroussi.com/posts/>posts on this site</a> and <a href=https://www.linkedin.com/in/yanirseroussi/ target=_blank rel=noopener>my LinkedIn profile</a>.</li></ul><h2 id=outcomes-beat-job-titles>Outcomes beat job titles<a hidden class=anchor aria-hidden=true href=#outcomes-beat-job-titles>#</a></h2><p>One of the downsides of working in an ever-changing field and accumulating a broad range of experiences is that it&rsquo;s hard to summarise with a concise title. For example, being a data scientist <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/>used to imply having strong software engineering skills</a>, but <a href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/>this has changed over time</a>. It&rsquo;s a similar story with the decline and rise of artificial intelligence. In an ideal world, I&rsquo;d be able to let my work speak for itself. In our world, people search for keywords and have different understandings of concepts, e.g., they may want &ldquo;an AI solution&rdquo; to a problem that can be solved with deterministic software engineering. Or they may believe they need an AI Engineer rather than a Data Scientist, when a few years ago it&rsquo;d have been the opposite (as I&rsquo;m writing this in 2023, you could replace the words <em>data science</em> with <em>artificial intelligence</em> across my historical posts and much of what I wrote would still hold).</p><p>Anyway, whether you&rsquo;re trying to navigate Data & AI terminology or solve specific problems, I can probably help. My aim is to get to the root of business problems and iteratively implement pragmatic solutions. The taxonomy of Data & AI professionals is only relevant if I&rsquo;m helping you hire a team.</p><figure><a href=mlops-roles.png target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/about/mlops-roles_hua09c876fee5ef02e6d697df08d86922e_54764_360x0_resize_box_3.png 360w,
 https://yanirseroussi.com/about/mlops-roles_hua09c876fee5ef02e6d697df08d86922e_54764_480x0_resize_box_3.png 480w,
 https://yanirseroussi.com/about/mlops-roles_hua09c876fee5ef02e6d697df08d86922e_54764_720x0_resize_box_3.png 720w,
diff --git a/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css b/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css
similarity index 84%
rename from assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css
rename to assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css
index c464f5d80..57bcb72dd 100644
--- a/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css
+++ b/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css
@@ -4,4 +4,4 @@
   Copyright (c) 2020 nanxiaobei and adityatelange
   Copyright (c) 2021-2024 adityatelange
 */
-:root{--gap:24px;--content-gap:20px;--nav-width:1024px;--main-width:720px;--header-height:60px;--footer-height:60px;--radius:8px;--theme:rgb(255, 255, 255);--entry:rgb(255, 255, 255);--primary:rgb(30, 30, 30);--secondary:rgb(108, 108, 108);--tertiary:rgb(214, 214, 214);--content:rgb(31, 31, 31);--code-block-bg:rgb(28, 29, 33);--code-bg:rgb(245, 245, 245);--border:rgb(238, 238, 238)}.dark{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--code-bg)}.dark.list{background:var(--theme)}*,::after,::before{box-sizing:border-box}html{-webkit-tap-highlight-color:transparent;overflow-y:scroll;-webkit-text-size-adjust:100%;text-size-adjust:100%}a,button,body,h1,h2,h3,h4,h5,h6{color:var(--primary)}body{font-family:-apple-system,BlinkMacSystemFont,segoe ui,Roboto,Oxygen,Ubuntu,Cantarell,open sans,helvetica neue,sans-serif;font-size:18px;line-height:1.6;word-break:break-word;background:var(--theme)}article,aside,figcaption,figure,footer,header,hgroup,main,nav,section,table{display:block}h1,h2,h3,h4,h5,h6{line-height:1.2}h1,h2,h3,h4,h5,h6,p{margin-top:0;margin-bottom:0}ul{padding:0}a{text-decoration:none}body,figure,ul{margin:0}table{width:100%;border-collapse:collapse;border-spacing:0;overflow-x:auto;word-break:keep-all}button,input,textarea{padding:0;font:inherit;background:0 0;border:0}input,textarea{outline:0}button,input[type=button],input[type=submit]{cursor:pointer}input:-webkit-autofill,textarea:-webkit-autofill{box-shadow:0 0 0 50px var(--theme)inset}img{display:block;max-width:100%}.not-found{position:absolute;left:0;right:0;display:flex;align-items:center;justify-content:center;height:80%;font-size:160px;font-weight:700}.archive-posts{width:100%;font-size:16px}.archive-year{margin-top:40px}.archive-year:not(:last-of-type){border-bottom:2px solid var(--border)}.archive-month{display:flex;align-items:flex-start;padding:10px 0}.archive-month-header{margin:25px 0;width:200px}.archive-month:not(:last-of-type){border-bottom:1px solid var(--border)}.archive-entry{position:relative;padding:5px;margin:10px 0}.archive-entry-title{margin:5px 0;font-weight:400}.archive-count,.archive-meta{color:var(--secondary);font-size:14px}.footer,.top-link{font-size:12px;color:var(--secondary)}.footer{max-width:calc(var(--main-width) + var(--gap) * 2);margin:auto;padding:calc((var(--footer-height) - var(--gap))/2)var(--gap);text-align:center;line-height:24px}.footer span{margin-inline-start:1px;margin-inline-end:1px}.footer span:last-child{white-space:nowrap}.footer a{color:inherit;border-bottom:1px solid var(--secondary)}.footer a:hover{border-bottom:1px solid var(--primary)}.top-link{visibility:hidden;position:fixed;bottom:60px;right:30px;z-index:99;background:var(--tertiary);width:42px;height:42px;padding:12px;border-radius:64px;transition:visibility .5s,opacity .8s linear}.top-link,.top-link svg{filter:drop-shadow(0 0 0 var(--theme))}.footer a:hover,.top-link:hover{color:var(--primary)}.top-link:focus,#theme-toggle:focus{outline:0}.nav{display:flex;flex-wrap:wrap;justify-content:space-between;max-width:calc(var(--nav-width) + var(--gap) * 2);margin-inline-start:auto;margin-inline-end:auto;line-height:var(--header-height)}.nav a{display:block}.logo,#menu{display:flex;margin:auto var(--gap)}.logo{flex-wrap:inherit}.logo a{font-size:24px;font-weight:700}.logo a img,.logo a svg{display:inline;vertical-align:middle;pointer-events:none;transform:translate(0,-10%);border-radius:6px;margin-inline-end:8px}button#theme-toggle{font-size:26px;margin:auto 4px}body.dark #moon{vertical-align:middle;display:none}body:not(.dark) #sun{display:none}#menu{list-style:none;word-break:keep-all;overflow-x:auto;white-space:nowrap}#menu li+li{margin-inline-start:var(--gap)}#menu a{font-size:16px}#menu .active{font-weight:500;border-bottom:2px solid}.lang-switch li,.lang-switch ul,.logo-switches{display:inline-flex;margin:auto 4px}.lang-switch{display:flex;flex-wrap:inherit}.lang-switch a{margin:auto 3px;font-size:16px;font-weight:500}.logo-switches{flex-wrap:inherit}.main{position:relative;min-height:calc(100vh - var(--header-height) - var(--footer-height));max-width:calc(var(--main-width) + var(--gap) * 2);margin:auto;padding:var(--gap)}.page-header h1{font-size:40px}.pagination{display:flex}.pagination a{color:var(--theme);font-size:13px;line-height:36px;background:var(--primary);border-radius:calc(36px/2);padding:0 16px}.pagination .next{margin-inline-start:auto}.social-icons a{display:inline-flex;padding:10px}.social-icons a svg{height:26px;width:26px}code{direction:ltr}div.highlight,pre{position:relative}.copy-code{display:none;position:absolute;top:4px;right:4px;color:rgba(255,255,255,.8);background:rgba(78,78,78,.8);border-radius:var(--radius);padding:0 5px;font-size:14px;user-select:none}div.highlight:hover .copy-code,pre:hover .copy-code{display:block}.first-entry{position:relative;display:flex;flex-direction:column;justify-content:center;min-height:320px;margin:var(--gap)0 calc(var(--gap) * 2)}.first-entry .entry-header{overflow:hidden;display:-webkit-box;-webkit-box-orient:vertical;-webkit-line-clamp:3}.first-entry .entry-header h1{font-size:34px;line-height:1.3}.first-entry .entry-content{margin:14px 0;font-size:16px;-webkit-line-clamp:3}.first-entry .entry-footer{font-size:14px}.home-info .entry-content{-webkit-line-clamp:unset}.post-entry{position:relative;margin-bottom:var(--gap);padding:var(--gap);background:var(--entry);border-radius:var(--radius);transition:transform .1s;border:1px solid var(--border)}.post-entry:active{transform:scale(.96)}.tag-entry .entry-cover{display:none}.entry-header h2{font-size:24px;line-height:1.3}.entry-content{margin:8px 0;color:var(--secondary);font-size:14px;line-height:1.6;overflow:hidden;display:-webkit-box;-webkit-box-orient:vertical;-webkit-line-clamp:2}.entry-footer{color:var(--secondary);font-size:13px}.entry-link{position:absolute;left:0;right:0;top:0;bottom:0}.entry-hint{color:var(--secondary)}.entry-hint-parent{display:flex;justify-content:space-between}.entry-cover{font-size:14px;margin-bottom:var(--gap);text-align:center}.entry-cover img{border-radius:var(--radius);pointer-events:none;width:100%;height:auto}.entry-cover a{color:var(--secondary);box-shadow:0 1px 0 var(--primary)}.page-header,.post-header{margin:24px auto var(--content-gap)}.post-title{margin-bottom:2px;font-size:40px}.post-description{margin-top:10px;margin-bottom:5px}.post-meta,.breadcrumbs{color:var(--secondary);font-size:14px;display:flex;flex-wrap:wrap}.post-meta .i18n_list li{display:inline-flex;list-style:none;margin:auto 3px;box-shadow:0 1px 0 var(--secondary)}.breadcrumbs a{font-size:16px}.post-content{color:var(--content)}.post-content h3,.post-content h4,.post-content h5,.post-content h6{margin:24px 0 16px}.post-content h1{margin:40px auto 32px;font-size:40px}.post-content h2{margin:32px auto 24px;font-size:32px}.post-content h3{font-size:24px}.post-content h4{font-size:16px}.post-content h5{font-size:14px}.post-content h6{font-size:12px}.post-content a,.toc a:hover{box-shadow:0 1px;box-decoration-break:clone;-webkit-box-decoration-break:clone}.post-content a code{margin:auto 0;border-radius:0;box-shadow:0 -1px 0 var(--primary)inset}.post-content del{text-decoration:line-through}.post-content dl,.post-content ol,.post-content p,.post-content figure,.post-content ul{margin-bottom:var(--content-gap)}.post-content ol,.post-content ul{padding-inline-start:20px}.post-content li{margin-top:5px}.post-content li p{margin-bottom:0}.post-content dl{display:flex;flex-wrap:wrap;margin:0}.post-content dt{width:25%;font-weight:700}.post-content dd{width:75%;margin-inline-start:0;padding-inline-start:10px}.post-content dd~dd,.post-content dt~dt{margin-top:10px}.post-content table{margin-bottom:var(--content-gap)}.post-content table th,.post-content table:not(.highlighttable,.highlight table,.gist .highlight) td{min-width:80px;padding:8px 5px;line-height:1.5;border-bottom:1px solid var(--border)}.post-content table th{text-align:start}.post-content table:not(.highlighttable) td code:only-child{margin:auto 0}.post-content .highlight table{border-radius:var(--radius)}.post-content .highlight:not(table){margin:10px auto;background:var(--code-block-bg)!important;border-radius:var(--radius);direction:ltr}.post-content li>.highlight{margin-inline-end:0}.post-content ul pre{margin-inline-start:calc(var(--gap) * -2)}.post-content .highlight pre{margin:0}.post-content .highlighttable{table-layout:fixed}.post-content .highlighttable td:first-child{width:40px}.post-content .highlighttable td .linenodiv{padding-inline-end:0!important}.post-content .highlighttable td .highlight,.post-content .highlighttable td .linenodiv pre{margin-bottom:0}.post-content code{margin:auto 4px;padding:4px 6px;font-size:.78em;line-height:1.5;background:var(--code-bg);border-radius:2px}.post-content pre code{display:grid;margin:auto 0;padding:10px;color:#d5d5d6;background:var(--code-block-bg)!important;border-radius:var(--radius);overflow-x:auto;word-break:break-all}.post-content blockquote{margin:20px 0;padding:0 14px;border-inline-start:3px solid var(--primary)}.post-content hr{margin:30px 0;height:2px;background:var(--tertiary);border:0}.post-content iframe{max-width:100%}.post-content img{border-radius:4px;margin:1rem 0}.post-content img[src*="#center"]{margin:1rem auto}.post-content figure.align-center{text-align:center}.post-content figure>figcaption{color:var(--primary);font-size:16px;font-weight:700;margin:8px 0 16px}.post-content figure>figcaption>p{color:var(--secondary);font-size:14px;font-weight:400}.toc{margin:0 2px 40px;border:1px solid var(--border);background:var(--code-bg);border-radius:var(--radius);padding:.4em}.dark .toc{background:var(--entry)}.toc details summary{cursor:zoom-in;margin-inline-start:20px}.toc details[open] summary{cursor:zoom-out}.toc .details{display:inline;font-weight:500}.toc .inner{margin:0 20px;padding:10px 20px}.toc li ul{margin-inline-start:var(--gap)}.toc summary:focus{outline:0}.post-footer{margin-top:56px}.post-footer>*{margin-bottom:10px}.post-tags{display:flex;flex-wrap:wrap;gap:10px}.post-tags li{display:inline-block}.post-tags a,.share-buttons,.paginav{border-radius:var(--radius);background:var(--code-bg);border:1px solid var(--border)}.post-tags a{display:block;padding:0 14px;color:var(--secondary);font-size:14px;line-height:34px;background:var(--code-bg)}.post-tags a:hover,.paginav a:hover{background:var(--border)}.share-buttons{padding:10px;display:flex;justify-content:center;overflow-x:auto;gap:10px}.share-buttons li,.share-buttons a{display:inline-flex}.share-buttons a:not(:last-of-type){margin-inline-end:12px}h1:hover .anchor,h2:hover .anchor,h3:hover .anchor,h4:hover .anchor,h5:hover .anchor,h6:hover .anchor{display:inline-flex;color:var(--secondary);margin-inline-start:8px;font-weight:500;user-select:none}.paginav{display:flex;line-height:30px}.paginav a{padding-inline-start:14px;padding-inline-end:14px;border-radius:var(--radius)}.paginav .title{letter-spacing:1px;text-transform:uppercase;font-size:small;color:var(--secondary)}.paginav .prev,.paginav .next{width:50%}.paginav span:hover:not(.title){box-shadow:0 1px}.paginav .next{margin-inline-start:auto;text-align:right}[dir=rtl] .paginav .next{text-align:left}h1>a>svg{display:inline}img.in-text{display:inline;margin:auto}.buttons,.main .profile{display:flex;justify-content:center}.main .profile{align-items:center;min-height:calc(100vh - var(--header-height) - var(--footer-height) - (var(--gap) * 2));text-align:center}.profile .profile_inner{display:flex;flex-direction:column;align-items:center;gap:10px}.profile img{border-radius:50%}.buttons{flex-wrap:wrap;max-width:400px}.button{background:var(--tertiary);border-radius:var(--radius);margin:8px;padding:6px;transition:transform .1s}.button-inner{padding:0 8px}.button:active{transform:scale(.96)}#searchbox input{padding:4px 10px;width:100%;color:var(--primary);font-weight:700;border:2px solid var(--tertiary);border-radius:var(--radius)}#searchbox input:focus{border-color:var(--secondary)}#searchResults li{list-style:none;border-radius:var(--radius);padding:10px;margin:10px 0;position:relative;font-weight:500}#searchResults{margin:10px 0;width:100%}#searchResults li:active{transition:transform .1s;transform:scale(.98)}#searchResults a{position:absolute;width:100%;height:100%;top:0;left:0;outline:none}#searchResults .focus{transform:scale(.98);border:2px solid var(--tertiary)}.terms-tags li{display:inline-block;margin:10px;font-weight:500}.terms-tags a{display:block;padding:3px 10px;background:var(--tertiary);border-radius:6px;transition:transform .1s}.terms-tags a:active{background:var(--tertiary);transform:scale(.96)}.bg{color:#cad3f5;background-color:#24273a}.chroma{color:#cad3f5;background-color:#24273a}.chroma .x{}.chroma .err{color:#ed8796}.chroma .cl{}.chroma .lnlinks{outline:none;text-decoration:none;color:inherit}.chroma .lntd{vertical-align:top;padding:0;margin:0;border:0}.chroma .lntable{border-spacing:0;padding:0;margin:0;border:0}.chroma .hl{background-color:#474733}.chroma .lnt{white-space:pre;-webkit-user-select:none;user-select:none;margin-right:.4em;padding:0 .4em;color:#8087a2}.chroma .ln{white-space:pre;-webkit-user-select:none;user-select:none;margin-right:.4em;padding:0 .4em;color:#8087a2}.chroma .line{display:flex}.chroma .k{color:#c6a0f6}.chroma .kc{color:#f5a97f}.chroma .kd{color:#ed8796}.chroma .kn{color:#8bd5ca}.chroma .kp{color:#c6a0f6}.chroma .kr{color:#c6a0f6}.chroma .kt{color:#ed8796}.chroma .n{}.chroma .na{color:#8aadf4}.chroma .nb{color:#91d7e3}.chroma .bp{color:#91d7e3}.chroma .nc{color:#eed49f}.chroma .no{color:#eed49f}.chroma .nd{color:#8aadf4;font-weight:700}.chroma .ni{color:#8bd5ca}.chroma .ne{color:#f5a97f}.chroma .nf{color:#8aadf4}.chroma .fm{color:#8aadf4}.chroma .nl{color:#91d7e3}.chroma .nn{color:#f5a97f}.chroma .nx{}.chroma .py{color:#f5a97f}.chroma .nt{color:#c6a0f6}.chroma .nv{color:#f4dbd6}.chroma .vc{color:#f4dbd6}.chroma .vg{color:#f4dbd6}.chroma .vi{color:#f4dbd6}.chroma .vm{color:#f4dbd6}.chroma .l{}.chroma .ld{}.chroma .s{color:#a6da95}.chroma .sa{color:#ed8796}.chroma .sb{color:#a6da95}.chroma .sc{color:#a6da95}.chroma .dl{color:#8aadf4}.chroma .sd{color:#6e738d}.chroma .s2{color:#a6da95}.chroma .se{color:#8aadf4}.chroma .sh{color:#6e738d}.chroma .si{color:#a6da95}.chroma .sx{color:#a6da95}.chroma .sr{color:#8bd5ca}.chroma .s1{color:#a6da95}.chroma .ss{color:#a6da95}.chroma .m{color:#f5a97f}.chroma .mb{color:#f5a97f}.chroma .mf{color:#f5a97f}.chroma .mh{color:#f5a97f}.chroma .mi{color:#f5a97f}.chroma .il{color:#f5a97f}.chroma .mo{color:#f5a97f}.chroma .o{color:#91d7e3;font-weight:700}.chroma .ow{color:#91d7e3;font-weight:700}.chroma .p{}.chroma .c{color:#6e738d;font-style:italic}.chroma .ch{color:#6e738d;font-style:italic}.chroma .cm{color:#6e738d;font-style:italic}.chroma .c1{color:#6e738d;font-style:italic}.chroma .cs{color:#6e738d;font-style:italic}.chroma .cp{color:#6e738d;font-style:italic}.chroma .cpf{color:#6e738d;font-weight:700;font-style:italic}.chroma .g{}.chroma .gd{color:#ed8796;background-color:#363a4f}.chroma .ge{font-style:italic}.chroma .gr{color:#ed8796}.chroma .gh{color:#f5a97f;font-weight:700}.chroma .gi{color:#a6da95;background-color:#363a4f}.chroma .go{}.chroma .gp{}.chroma .gs{font-weight:700}.chroma .gu{color:#f5a97f;font-weight:700}.chroma .gt{color:#ed8796}.chroma .gl{text-decoration:underline}.chroma .w{}.chroma{background-color:unset!important}.chroma .hl{display:flex}.chroma .lnt{padding:0 0 0 12px}.highlight pre.chroma code{padding:8px 0}.highlight pre.chroma .line .cl,.chroma .ln{padding:0 10px}.chroma .lntd:last-of-type{width:100%}::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-track{background:var(--code-bg)}::-webkit-scrollbar-thumb{background:var(--tertiary);border:5px solid var(--theme);border-radius:var(--radius)}.list:not(.dark)::-webkit-scrollbar-thumb{border:5px solid var(--code-bg)}::-webkit-scrollbar-thumb:hover{background:var(--secondary)}::-webkit-scrollbar:not(.highlighttable,.highlight table,.gist .highlight){background:var(--theme)}.post-content .highlighttable td .highlight pre code::-webkit-scrollbar{display:none}.post-content :not(table) ::-webkit-scrollbar-thumb{border:2px solid var(--code-block-bg);background:#717175}.post-content :not(table) ::-webkit-scrollbar-thumb:hover{background:#a3a3a5}.gist table::-webkit-scrollbar-thumb{border:2px solid #fff;background:#adadad}.gist table::-webkit-scrollbar-thumb:hover{background:#707070}.post-content table::-webkit-scrollbar-thumb{border-width:2px}@media screen and (min-width:768px){::-webkit-scrollbar{width:19px;height:11px}}@media screen and (max-width:768px){:root{--gap:14px}.profile img{transform:scale(.85)}.first-entry{min-height:260px}.archive-month{flex-direction:column}.archive-year{margin-top:20px}.footer{padding:calc((var(--footer-height) - var(--gap) - 10px)/2)var(--gap)}}@media screen and (max-width:900px){.list .top-link{transform:translateY(-5rem)}}@media screen and (max-width:340px){.share-buttons{justify-content:unset}}@media(prefers-reduced-motion){.terms-tags a:active,.button:active,.post-entry:active,.top-link,#searchResults .focus,#searchResults li:active{transform:none}}.not-found{font-size:40px;height:100%;text-align:center;text-transform:uppercase}.post-content blockquote{font-style:italic;line-height:1.4;margin:0 10% var(--content-gap)}.post-content blockquote p{margin-bottom:10px}.post-content blockquote footer{font-style:normal;text-align:right}.post-content blockquote footer::before{content:'– '}section.comment-section{margin-top:40px;margin-bottom:40px}.comment-body{border:1px solid var(--content);padding:10px;margin-top:10px;margin-bottom:2px;margin-left:1em;border-radius:5px}.comment-level-1{margin-left:2em}.comment-level-2{margin-left:4em}.comment-level-3{margin-left:6em}.comment-header{margin-top:20px;margin-bottom:20px}.comment-avatar{display:inline-block;border-radius:50%;max-width:50px;max-height:50px}.comment-info{display:inline-block;margin-bottom:0}.comment-button{color:var(--theme);font-size:13px;line-height:36px;background:var(--primary);border-radius:18px;padding:0 16px;display:inline-block;float:right}.comment-button.comment-button-small{font-size:11px;line-height:24px}section.comment-section p.contact-cta{border:1px solid var(--content);border-radius:5px;color:var(--secondary);font-size:14px;font-weight:400;padding:10px;margin-bottom:2px;margin-top:10px}.post-content figure{text-align:center}.post-content figure.float-right{float:right;padding-left:10px}.post-content figure img{margin:1rem auto}.post-content figure.white-bg img{padding:10px;background:#fff}.global-footer{background:var(--code-bg);border-top:1px solid var(--content)}header.header{background:var(--code-bg);border-bottom:1px solid var(--content);padding:10px 0}header.header .logo{flex-wrap:unset}header.header .logo a{font-size:20px;font-weight:400}header.header .nav{flex-wrap:unset;line-height:1.4;text-align:center}header.header button#theme-toggle{font-size:22px;margin:auto 12px}header.page-header .post-description a{text-decoration:underline}.home-info{margin:var(--gap)0}.home-info a,.profile a{text-decoration:underline}.profile .buttons a{text-decoration:none}.profile_inner{text-align:left}.home-info .entry-header h1,.profile h1{text-align:center}.home-info .entry-content{font-size:20px}.home-info .entry-content p{margin-top:10px}.home-info .entry-content .footnote,.profile_inner .footnote{font-size:14px;font-style:italic}.home-info .entry-footer{display:none}.home-info .entry-cover img,article.post-entry .entry-cover img{margin:auto;width:unset;max-height:250px}.home-info .entry-cover img{border-radius:calc(100 * var(--radius))}.archive-entry div.entry-content,article.post-entry div.entry-content{-webkit-line-clamp:unset}.archive-entry{position:relative;margin-bottom:var(--gap);padding:var(--gap);background:var(--entry);border-radius:var(--radius);transition:transform .1s;border:1px solid var(--border)}.archive-entry h3.archive-entry-title{font-size:24px;font-weight:700;line-height:1.3;margin:0}.archive-entry .entry-content{color:var(--primary)}#menu-trigger{display:none}.menu{display:flex;margin:auto var(--gap);list-style:none;word-break:keep-all;overflow-x:auto;white-space:nowrap}.menu li+li{margin-inline-start:var(--gap)}.menu a{font-size:16px}.menu .active{font-weight:500;border-bottom:2px solid}@media screen and (max-width:768px){.menu{list-style:none;position:absolute;right:0;z-index:5;padding:var(--gap);background:var(--code-bg);border-radius:var(--radius);border:2px solid var(--tertiary);line-height:2.5;margin:calc(var(--header-height) - var(--gap))var(--gap);display:block}.list .menu{background:var(--entry)}.menu li+li{margin-inline-start:0}#menu-trigger{font-size:30px;position:relative;display:block;float:right;margin:auto var(--gap)}.menu.hidden{display:none}}article.post-single figure.entry-cover img{margin:auto;width:unset;max-height:250px}article.post-single img{border:1px solid var(--primary)}.post-content p.intro-note{font-size:15px;font-style:italic;margin-left:10%;margin-right:10%}.post-content pre code{white-space:pre-wrap}.post-content p.highlight-box{background-color:var(--primary);border-radius:4px;color:var(--theme);margin-left:10%;margin-right:10%;padding:10px;text-align:center}.post-content p.indent-1{margin-left:2%}.post-content p.reblog-read-more{font-style:normal;font-weight:700;text-align:right;text-transform:uppercase}.post-content iframe.contact-form{background-color:#fff;border:none;border-radius:4px}.post-content sup a.footnote-ref{box-shadow:0 0;font-size:13px;text-decoration:underline}.mailing-list-container{background:var(--code-bg);border:2px solid var(--content);border-radius:5px;margin-top:20px}form.mailing-list{text-align:center;max-width:calc(var(--nav-width) + var(--gap) * 2);margin-inline-start:auto;margin-inline-end:auto}form.mailing-list label{display:block;font-weight:700;padding-top:5px}form.mailing-list input[type=email]{background:var(--theme);border-radius:4px;color:var(--primary);margin:8px;padding:4px;min-width:80%}form.mailing-list button{background:var(--primary);border-radius:18px;color:var(--theme);font-size:16px;padding:0 16px}.mailing-list-container .footer{padding-bottom:10px;padding-top:0}.mailing-list-container .footer a{border-bottom:none;box-shadow:none;text-decoration:underline}.mailing-list-container ul.formkit-alert{margin-bottom:0}.mailing-list-link{background:var(--tertiary);color:var(--primary);font-size:14px;border-radius:2px;padding:6px;visibility:hidden;position:fixed;top:5px;right:10px;z-index:99;transition:visibility .5s,opacity .8s linear}.mailing-list-link:hover{background:var(--primary);color:var(--theme)}.mailing-list-link:focus{outline:0}
\ No newline at end of file
+:root{--gap:24px;--content-gap:20px;--nav-width:1024px;--main-width:720px;--header-height:60px;--footer-height:60px;--radius:8px;--theme:rgb(255, 255, 255);--entry:rgb(255, 255, 255);--primary:rgb(30, 30, 30);--secondary:rgb(108, 108, 108);--tertiary:rgb(214, 214, 214);--content:rgb(31, 31, 31);--code-block-bg:rgb(28, 29, 33);--code-bg:rgb(245, 245, 245);--border:rgb(238, 238, 238)}.dark{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--code-bg)}.dark.list{background:var(--theme)}*,::after,::before{box-sizing:border-box}html{-webkit-tap-highlight-color:transparent;overflow-y:scroll;-webkit-text-size-adjust:100%;text-size-adjust:100%}a,button,body,h1,h2,h3,h4,h5,h6{color:var(--primary)}body{font-family:-apple-system,BlinkMacSystemFont,segoe ui,Roboto,Oxygen,Ubuntu,Cantarell,open sans,helvetica neue,sans-serif;font-size:18px;line-height:1.6;word-break:break-word;background:var(--theme)}article,aside,figcaption,figure,footer,header,hgroup,main,nav,section,table{display:block}h1,h2,h3,h4,h5,h6{line-height:1.2}h1,h2,h3,h4,h5,h6,p{margin-top:0;margin-bottom:0}ul{padding:0}a{text-decoration:none}body,figure,ul{margin:0}table{width:100%;border-collapse:collapse;border-spacing:0;overflow-x:auto;word-break:keep-all}button,input,textarea{padding:0;font:inherit;background:0 0;border:0}input,textarea{outline:0}button,input[type=button],input[type=submit]{cursor:pointer}input:-webkit-autofill,textarea:-webkit-autofill{box-shadow:0 0 0 50px var(--theme)inset}img{display:block;max-width:100%}.not-found{position:absolute;left:0;right:0;display:flex;align-items:center;justify-content:center;height:80%;font-size:160px;font-weight:700}.archive-posts{width:100%;font-size:16px}.archive-year{margin-top:40px}.archive-year:not(:last-of-type){border-bottom:2px solid var(--border)}.archive-month{display:flex;align-items:flex-start;padding:10px 0}.archive-month-header{margin:25px 0;width:200px}.archive-month:not(:last-of-type){border-bottom:1px solid var(--border)}.archive-entry{position:relative;padding:5px;margin:10px 0}.archive-entry-title{margin:5px 0;font-weight:400}.archive-count,.archive-meta{color:var(--secondary);font-size:14px}.footer,.top-link{font-size:12px;color:var(--secondary)}.footer{max-width:calc(var(--main-width) + var(--gap) * 2);margin:auto;padding:calc((var(--footer-height) - var(--gap))/2)var(--gap);text-align:center;line-height:24px}.footer span{margin-inline-start:1px;margin-inline-end:1px}.footer span:last-child{white-space:nowrap}.footer a{color:inherit;border-bottom:1px solid var(--secondary)}.footer a:hover{border-bottom:1px solid var(--primary)}.top-link{visibility:hidden;position:fixed;bottom:60px;right:30px;z-index:99;background:var(--tertiary);width:42px;height:42px;padding:12px;border-radius:64px;transition:visibility .5s,opacity .8s linear}.top-link,.top-link svg{filter:drop-shadow(0 0 0 var(--theme))}.footer a:hover,.top-link:hover{color:var(--primary)}.top-link:focus,#theme-toggle:focus{outline:0}.nav{display:flex;flex-wrap:wrap;justify-content:space-between;max-width:calc(var(--nav-width) + var(--gap) * 2);margin-inline-start:auto;margin-inline-end:auto;line-height:var(--header-height)}.nav a{display:block}.logo,#menu{display:flex;margin:auto var(--gap)}.logo{flex-wrap:inherit}.logo a{font-size:24px;font-weight:700}.logo a img,.logo a svg{display:inline;vertical-align:middle;pointer-events:none;transform:translate(0,-10%);border-radius:6px;margin-inline-end:8px}button#theme-toggle{font-size:26px;margin:auto 4px}body.dark #moon{vertical-align:middle;display:none}body:not(.dark) #sun{display:none}#menu{list-style:none;word-break:keep-all;overflow-x:auto;white-space:nowrap}#menu li+li{margin-inline-start:var(--gap)}#menu a{font-size:16px}#menu .active{font-weight:500;border-bottom:2px solid}.lang-switch li,.lang-switch ul,.logo-switches{display:inline-flex;margin:auto 4px}.lang-switch{display:flex;flex-wrap:inherit}.lang-switch a{margin:auto 3px;font-size:16px;font-weight:500}.logo-switches{flex-wrap:inherit}.main{position:relative;min-height:calc(100vh - var(--header-height) - var(--footer-height));max-width:calc(var(--main-width) + var(--gap) * 2);margin:auto;padding:var(--gap)}.page-header h1{font-size:40px}.pagination{display:flex}.pagination a{color:var(--theme);font-size:13px;line-height:36px;background:var(--primary);border-radius:calc(36px/2);padding:0 16px}.pagination .next{margin-inline-start:auto}.social-icons a{display:inline-flex;padding:10px}.social-icons a svg{height:26px;width:26px}code{direction:ltr}div.highlight,pre{position:relative}.copy-code{display:none;position:absolute;top:4px;right:4px;color:rgba(255,255,255,.8);background:rgba(78,78,78,.8);border-radius:var(--radius);padding:0 5px;font-size:14px;user-select:none}div.highlight:hover .copy-code,pre:hover .copy-code{display:block}.first-entry{position:relative;display:flex;flex-direction:column;justify-content:center;min-height:320px;margin:var(--gap)0 calc(var(--gap) * 2)}.first-entry .entry-header{overflow:hidden;display:-webkit-box;-webkit-box-orient:vertical;-webkit-line-clamp:3}.first-entry .entry-header h1{font-size:34px;line-height:1.3}.first-entry .entry-content{margin:14px 0;font-size:16px;-webkit-line-clamp:3}.first-entry .entry-footer{font-size:14px}.home-info .entry-content{-webkit-line-clamp:unset}.post-entry{position:relative;margin-bottom:var(--gap);padding:var(--gap);background:var(--entry);border-radius:var(--radius);transition:transform .1s;border:1px solid var(--border)}.post-entry:active{transform:scale(.96)}.tag-entry .entry-cover{display:none}.entry-header h2{font-size:24px;line-height:1.3}.entry-content{margin:8px 0;color:var(--secondary);font-size:14px;line-height:1.6;overflow:hidden;display:-webkit-box;-webkit-box-orient:vertical;-webkit-line-clamp:2}.entry-footer{color:var(--secondary);font-size:13px}.entry-link{position:absolute;left:0;right:0;top:0;bottom:0}.entry-hint{color:var(--secondary)}.entry-hint-parent{display:flex;justify-content:space-between}.entry-cover{font-size:14px;margin-bottom:var(--gap);text-align:center}.entry-cover img{border-radius:var(--radius);pointer-events:none;width:100%;height:auto}.entry-cover a{color:var(--secondary);box-shadow:0 1px 0 var(--primary)}.page-header,.post-header{margin:24px auto var(--content-gap)}.post-title{margin-bottom:2px;font-size:40px}.post-description{margin-top:10px;margin-bottom:5px}.post-meta,.breadcrumbs{color:var(--secondary);font-size:14px;display:flex;flex-wrap:wrap}.post-meta .i18n_list li{display:inline-flex;list-style:none;margin:auto 3px;box-shadow:0 1px 0 var(--secondary)}.breadcrumbs a{font-size:16px}.post-content{color:var(--content)}.post-content h3,.post-content h4,.post-content h5,.post-content h6{margin:24px 0 16px}.post-content h1{margin:40px auto 32px;font-size:40px}.post-content h2{margin:32px auto 24px;font-size:32px}.post-content h3{font-size:24px}.post-content h4{font-size:16px}.post-content h5{font-size:14px}.post-content h6{font-size:12px}.post-content a,.toc a:hover{box-shadow:0 1px;box-decoration-break:clone;-webkit-box-decoration-break:clone}.post-content a code{margin:auto 0;border-radius:0;box-shadow:0 -1px 0 var(--primary)inset}.post-content del{text-decoration:line-through}.post-content dl,.post-content ol,.post-content p,.post-content figure,.post-content ul{margin-bottom:var(--content-gap)}.post-content ol,.post-content ul{padding-inline-start:20px}.post-content li{margin-top:5px}.post-content li p{margin-bottom:0}.post-content dl{display:flex;flex-wrap:wrap;margin:0}.post-content dt{width:25%;font-weight:700}.post-content dd{width:75%;margin-inline-start:0;padding-inline-start:10px}.post-content dd~dd,.post-content dt~dt{margin-top:10px}.post-content table{margin-bottom:var(--content-gap)}.post-content table th,.post-content table:not(.highlighttable,.highlight table,.gist .highlight) td{min-width:80px;padding:8px 5px;line-height:1.5;border-bottom:1px solid var(--border)}.post-content table th{text-align:start}.post-content table:not(.highlighttable) td code:only-child{margin:auto 0}.post-content .highlight table{border-radius:var(--radius)}.post-content .highlight:not(table){margin:10px auto;background:var(--code-block-bg)!important;border-radius:var(--radius);direction:ltr}.post-content li>.highlight{margin-inline-end:0}.post-content ul pre{margin-inline-start:calc(var(--gap) * -2)}.post-content .highlight pre{margin:0}.post-content .highlighttable{table-layout:fixed}.post-content .highlighttable td:first-child{width:40px}.post-content .highlighttable td .linenodiv{padding-inline-end:0!important}.post-content .highlighttable td .highlight,.post-content .highlighttable td .linenodiv pre{margin-bottom:0}.post-content code{margin:auto 4px;padding:4px 6px;font-size:.78em;line-height:1.5;background:var(--code-bg);border-radius:2px}.post-content pre code{display:grid;margin:auto 0;padding:10px;color:#d5d5d6;background:var(--code-block-bg)!important;border-radius:var(--radius);overflow-x:auto;word-break:break-all}.post-content blockquote{margin:20px 0;padding:0 14px;border-inline-start:3px solid var(--primary)}.post-content hr{margin:30px 0;height:2px;background:var(--tertiary);border:0}.post-content iframe{max-width:100%}.post-content img{border-radius:4px;margin:1rem 0}.post-content img[src*="#center"]{margin:1rem auto}.post-content figure.align-center{text-align:center}.post-content figure>figcaption{color:var(--primary);font-size:16px;font-weight:700;margin:8px 0 16px}.post-content figure>figcaption>p{color:var(--secondary);font-size:14px;font-weight:400}.toc{margin:0 2px 40px;border:1px solid var(--border);background:var(--code-bg);border-radius:var(--radius);padding:.4em}.dark .toc{background:var(--entry)}.toc details summary{cursor:zoom-in;margin-inline-start:20px}.toc details[open] summary{cursor:zoom-out}.toc .details{display:inline;font-weight:500}.toc .inner{margin:0 20px;padding:10px 20px}.toc li ul{margin-inline-start:var(--gap)}.toc summary:focus{outline:0}.post-footer{margin-top:56px}.post-footer>*{margin-bottom:10px}.post-tags{display:flex;flex-wrap:wrap;gap:10px}.post-tags li{display:inline-block}.post-tags a,.share-buttons,.paginav{border-radius:var(--radius);background:var(--code-bg);border:1px solid var(--border)}.post-tags a{display:block;padding:0 14px;color:var(--secondary);font-size:14px;line-height:34px;background:var(--code-bg)}.post-tags a:hover,.paginav a:hover{background:var(--border)}.share-buttons{padding:10px;display:flex;justify-content:center;overflow-x:auto;gap:10px}.share-buttons li,.share-buttons a{display:inline-flex}.share-buttons a:not(:last-of-type){margin-inline-end:12px}h1:hover .anchor,h2:hover .anchor,h3:hover .anchor,h4:hover .anchor,h5:hover .anchor,h6:hover .anchor{display:inline-flex;color:var(--secondary);margin-inline-start:8px;font-weight:500;user-select:none}.paginav{display:flex;line-height:30px}.paginav a{padding-inline-start:14px;padding-inline-end:14px;border-radius:var(--radius)}.paginav .title{letter-spacing:1px;text-transform:uppercase;font-size:small;color:var(--secondary)}.paginav .prev,.paginav .next{width:50%}.paginav span:hover:not(.title){box-shadow:0 1px}.paginav .next{margin-inline-start:auto;text-align:right}[dir=rtl] .paginav .next{text-align:left}h1>a>svg{display:inline}img.in-text{display:inline;margin:auto}.buttons,.main .profile{display:flex;justify-content:center}.main .profile{align-items:center;min-height:calc(100vh - var(--header-height) - var(--footer-height) - (var(--gap) * 2));text-align:center}.profile .profile_inner{display:flex;flex-direction:column;align-items:center;gap:10px}.profile img{border-radius:50%}.buttons{flex-wrap:wrap;max-width:400px}.button{background:var(--tertiary);border-radius:var(--radius);margin:8px;padding:6px;transition:transform .1s}.button-inner{padding:0 8px}.button:active{transform:scale(.96)}#searchbox input{padding:4px 10px;width:100%;color:var(--primary);font-weight:700;border:2px solid var(--tertiary);border-radius:var(--radius)}#searchbox input:focus{border-color:var(--secondary)}#searchResults li{list-style:none;border-radius:var(--radius);padding:10px;margin:10px 0;position:relative;font-weight:500}#searchResults{margin:10px 0;width:100%}#searchResults li:active{transition:transform .1s;transform:scale(.98)}#searchResults a{position:absolute;width:100%;height:100%;top:0;left:0;outline:none}#searchResults .focus{transform:scale(.98);border:2px solid var(--tertiary)}.terms-tags li{display:inline-block;margin:10px;font-weight:500}.terms-tags a{display:block;padding:3px 10px;background:var(--tertiary);border-radius:6px;transition:transform .1s}.terms-tags a:active{background:var(--tertiary);transform:scale(.96)}.bg{color:#cad3f5;background-color:#24273a}.chroma{color:#cad3f5;background-color:#24273a}.chroma .x{}.chroma .err{color:#ed8796}.chroma .cl{}.chroma .lnlinks{outline:none;text-decoration:none;color:inherit}.chroma .lntd{vertical-align:top;padding:0;margin:0;border:0}.chroma .lntable{border-spacing:0;padding:0;margin:0;border:0}.chroma .hl{background-color:#474733}.chroma .lnt{white-space:pre;-webkit-user-select:none;user-select:none;margin-right:.4em;padding:0 .4em;color:#8087a2}.chroma .ln{white-space:pre;-webkit-user-select:none;user-select:none;margin-right:.4em;padding:0 .4em;color:#8087a2}.chroma .line{display:flex}.chroma .k{color:#c6a0f6}.chroma .kc{color:#f5a97f}.chroma .kd{color:#ed8796}.chroma .kn{color:#8bd5ca}.chroma .kp{color:#c6a0f6}.chroma .kr{color:#c6a0f6}.chroma .kt{color:#ed8796}.chroma .n{}.chroma .na{color:#8aadf4}.chroma .nb{color:#91d7e3}.chroma .bp{color:#91d7e3}.chroma .nc{color:#eed49f}.chroma .no{color:#eed49f}.chroma .nd{color:#8aadf4;font-weight:700}.chroma .ni{color:#8bd5ca}.chroma .ne{color:#f5a97f}.chroma .nf{color:#8aadf4}.chroma .fm{color:#8aadf4}.chroma .nl{color:#91d7e3}.chroma .nn{color:#f5a97f}.chroma .nx{}.chroma .py{color:#f5a97f}.chroma .nt{color:#c6a0f6}.chroma .nv{color:#f4dbd6}.chroma .vc{color:#f4dbd6}.chroma .vg{color:#f4dbd6}.chroma .vi{color:#f4dbd6}.chroma .vm{color:#f4dbd6}.chroma .l{}.chroma .ld{}.chroma .s{color:#a6da95}.chroma .sa{color:#ed8796}.chroma .sb{color:#a6da95}.chroma .sc{color:#a6da95}.chroma .dl{color:#8aadf4}.chroma .sd{color:#6e738d}.chroma .s2{color:#a6da95}.chroma .se{color:#8aadf4}.chroma .sh{color:#6e738d}.chroma .si{color:#a6da95}.chroma .sx{color:#a6da95}.chroma .sr{color:#8bd5ca}.chroma .s1{color:#a6da95}.chroma .ss{color:#a6da95}.chroma .m{color:#f5a97f}.chroma .mb{color:#f5a97f}.chroma .mf{color:#f5a97f}.chroma .mh{color:#f5a97f}.chroma .mi{color:#f5a97f}.chroma .il{color:#f5a97f}.chroma .mo{color:#f5a97f}.chroma .o{color:#91d7e3;font-weight:700}.chroma .ow{color:#91d7e3;font-weight:700}.chroma .p{}.chroma .c{color:#6e738d;font-style:italic}.chroma .ch{color:#6e738d;font-style:italic}.chroma .cm{color:#6e738d;font-style:italic}.chroma .c1{color:#6e738d;font-style:italic}.chroma .cs{color:#6e738d;font-style:italic}.chroma .cp{color:#6e738d;font-style:italic}.chroma .cpf{color:#6e738d;font-weight:700;font-style:italic}.chroma .g{}.chroma .gd{color:#ed8796;background-color:#363a4f}.chroma .ge{font-style:italic}.chroma .gr{color:#ed8796}.chroma .gh{color:#f5a97f;font-weight:700}.chroma .gi{color:#a6da95;background-color:#363a4f}.chroma .go{}.chroma .gp{}.chroma .gs{font-weight:700}.chroma .gu{color:#f5a97f;font-weight:700}.chroma .gt{color:#ed8796}.chroma .gl{text-decoration:underline}.chroma .w{}.chroma{background-color:unset!important}.chroma .hl{display:flex}.chroma .lnt{padding:0 0 0 12px}.highlight pre.chroma code{padding:8px 0}.highlight pre.chroma .line .cl,.chroma .ln{padding:0 10px}.chroma .lntd:last-of-type{width:100%}::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-track{background:var(--code-bg)}::-webkit-scrollbar-thumb{background:var(--tertiary);border:5px solid var(--theme);border-radius:var(--radius)}.list:not(.dark)::-webkit-scrollbar-thumb{border:5px solid var(--code-bg)}::-webkit-scrollbar-thumb:hover{background:var(--secondary)}::-webkit-scrollbar:not(.highlighttable,.highlight table,.gist .highlight){background:var(--theme)}.post-content .highlighttable td .highlight pre code::-webkit-scrollbar{display:none}.post-content :not(table) ::-webkit-scrollbar-thumb{border:2px solid var(--code-block-bg);background:#717175}.post-content :not(table) ::-webkit-scrollbar-thumb:hover{background:#a3a3a5}.gist table::-webkit-scrollbar-thumb{border:2px solid #fff;background:#adadad}.gist table::-webkit-scrollbar-thumb:hover{background:#707070}.post-content table::-webkit-scrollbar-thumb{border-width:2px}@media screen and (min-width:768px){::-webkit-scrollbar{width:19px;height:11px}}@media screen and (max-width:768px){:root{--gap:14px}.profile img{transform:scale(.85)}.first-entry{min-height:260px}.archive-month{flex-direction:column}.archive-year{margin-top:20px}.footer{padding:calc((var(--footer-height) - var(--gap) - 10px)/2)var(--gap)}}@media screen and (max-width:900px){.list .top-link{transform:translateY(-5rem)}}@media screen and (max-width:340px){.share-buttons{justify-content:unset}}@media(prefers-reduced-motion){.terms-tags a:active,.button:active,.post-entry:active,.top-link,#searchResults .focus,#searchResults li:active{transform:none}}.not-found{font-size:40px;height:100%;text-align:center;text-transform:uppercase}.post-content blockquote{font-style:italic;line-height:1.4;margin:0 10% var(--content-gap)}.post-content blockquote p{margin-bottom:10px}.post-content blockquote footer{font-style:normal;text-align:right}.post-content blockquote footer::before{content:'– '}section.comment-section{margin-top:40px;margin-bottom:40px}.comment-body{border:1px solid var(--content);padding:10px;margin-top:10px;margin-bottom:2px;margin-left:1em;border-radius:5px}.comment-level-1{margin-left:2em}.comment-level-2{margin-left:4em}.comment-level-3{margin-left:6em}.comment-header{margin-top:20px;margin-bottom:20px}.comment-avatar{display:inline-block;border-radius:50%;max-width:50px;max-height:50px}.comment-info{display:inline-block;margin-bottom:0}.comment-button{color:var(--theme);font-size:13px;line-height:36px;background:var(--primary);border-radius:18px;padding:0 16px;display:inline-block;float:right}.comment-button.comment-button-small{font-size:11px;line-height:24px}section.comment-section p.contact-cta{border:1px solid var(--content);border-radius:5px;color:var(--secondary);font-size:14px;font-weight:400;padding:10px;margin-bottom:2px;margin-top:10px}.post-content figure{text-align:center}.post-content figure.float-right{float:right;padding-left:10px}.post-content figure img{margin:1rem auto}.post-content figure.white-bg img{padding:10px;background:#fff}.global-footer{background:var(--code-bg);border-top:1px solid var(--content)}header.header{background:var(--code-bg);border-bottom:1px solid var(--content);padding:10px 0}header.header .logo{flex-wrap:unset}header.header .logo a{font-size:20px;font-weight:400}header.header .nav{flex-wrap:unset;line-height:1.4;text-align:center}header.header button#theme-toggle{font-size:22px;margin:auto 12px}header.page-header .post-description a{text-decoration:underline}.home-info{margin:var(--gap)0}.home-info a,.profile a{border-bottom:1px dotted;padding-bottom:3px}.profile .buttons a,.profile .social-icons a{border-bottom:unset}.profile_inner{text-align:left}.profile_inner dl dt{line-height:2.5}.home-info .entry-header h1,.profile h1{text-align:center}.home-info .entry-content{font-size:20px}.home-info .entry-content p{margin-top:10px}.home-info .entry-content .footnote,.profile_inner .footnote{font-size:14px;font-style:italic}.home-info .entry-footer{display:none}.home-info .entry-cover img,article.post-entry .entry-cover img{margin:auto;width:unset;max-height:250px}.home-info .entry-cover img{border-radius:calc(100 * var(--radius))}.archive-entry div.entry-content,article.post-entry div.entry-content{-webkit-line-clamp:unset}.archive-entry{position:relative;margin-bottom:var(--gap);padding:var(--gap);background:var(--entry);border-radius:var(--radius);transition:transform .1s;border:1px solid var(--border)}.archive-entry h3.archive-entry-title{font-size:24px;font-weight:700;line-height:1.3;margin:0}.archive-entry .entry-content{color:var(--primary)}#menu-trigger{display:none}.menu{display:flex;margin:auto var(--gap);list-style:none;word-break:keep-all;overflow-x:auto;white-space:nowrap}.menu li+li{margin-inline-start:var(--gap)}.menu a{font-size:16px}.menu .active{font-weight:500;border-bottom:2px solid}@media screen and (max-width:768px){.menu{list-style:none;position:absolute;right:0;z-index:5;padding:var(--gap);background:var(--code-bg);border-radius:var(--radius);border:2px solid var(--tertiary);line-height:2.5;margin:calc(var(--header-height) - var(--gap))var(--gap);display:block}.list .menu{background:var(--entry)}.menu li+li{margin-inline-start:0}#menu-trigger{font-size:30px;position:relative;display:block;float:right;margin:auto var(--gap)}.menu.hidden{display:none}}article.post-single figure.entry-cover img{margin:auto;width:unset;max-height:250px}article.post-single img{border:1px solid var(--primary)}.post-content p.intro-note{font-size:15px;font-style:italic;margin-left:10%;margin-right:10%}.post-content pre code{white-space:pre-wrap}.post-content p.highlight-box{background-color:var(--primary);border-radius:4px;color:var(--theme);margin-left:10%;margin-right:10%;padding:10px;text-align:center}.post-content p.indent-1{margin-left:2%}.post-content p.reblog-read-more{font-style:normal;font-weight:700;text-align:right;text-transform:uppercase}.post-content iframe.contact-form{background-color:#fff;border:none;border-radius:4px}.post-content sup a.footnote-ref{box-shadow:0 0;font-size:13px;text-decoration:underline}.mailing-list-container{background:var(--code-bg);border:2px solid var(--content);border-radius:5px;margin-top:20px}form.mailing-list{text-align:center;max-width:calc(var(--nav-width) + var(--gap) * 2);margin-inline-start:auto;margin-inline-end:auto}form.mailing-list label{display:block;font-weight:700;padding-top:5px}form.mailing-list input[type=email]{background:var(--theme);border-radius:4px;color:var(--primary);margin:8px;padding:4px;min-width:80%}form.mailing-list button{background:var(--primary);border-radius:18px;color:var(--theme);font-size:16px;padding:0 16px}.mailing-list-container .footer{padding-bottom:10px;padding-top:0}.mailing-list-container .footer a{border-bottom:none;box-shadow:none;text-decoration:underline}.mailing-list-container ul.formkit-alert{margin-bottom:0}.mailing-list-link{background:var(--tertiary);color:var(--primary);font-size:14px;border-radius:2px;padding:6px;visibility:hidden;position:fixed;top:5px;right:10px;z-index:99;transition:visibility .5s,opacity .8s linear}.mailing-list-link:hover{background:var(--primary);color:var(--theme)}.mailing-list-link:focus{outline:0}
\ No newline at end of file
diff --git a/causal-inference-resources/index.html b/causal-inference-resources/index.html
index 3d75e7a84..263910613 100644
--- a/causal-inference-resources/index.html
+++ b/causal-inference-resources/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Causal inference resources | Yanir Seroussi | Data & AI for Startup Impact</title>
 <meta name=keywords content><meta name=description content="This is a list of some causal inference resources, which I update from time to time. You can also check out my posts on causal inference and A/B testing.
 Books:
-Causal Inference: What if by Miguel Hernán and Jamie Robins: The most practical book I&rsquo;ve read. Highly recommended. Trustworthy Online Controlled Experiments : A Practical Guide to A/B Testing by Ron Kohavi, Diane Tang, and Ya Xu: Building on the authors&rsquo; decades of industry experience, this is pretty much the bible of online experiments, which is how causal inference is often done in practice."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/causal-inference-resources/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/causal-inference-resources/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Causal inference resources"><meta property="og:description" content="This is a list of some causal inference resources, which I update from time to time. You can also check out my posts on causal inference and A/B testing.
+Causal Inference: What if by Miguel Hernán and Jamie Robins: The most practical book I&rsquo;ve read. Highly recommended. Trustworthy Online Controlled Experiments : A Practical Guide to A/B Testing by Ron Kohavi, Diane Tang, and Ya Xu: Building on the authors&rsquo; decades of industry experience, this is pretty much the bible of online experiments, which is how causal inference is often done in practice."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/causal-inference-resources/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/causal-inference-resources/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Causal inference resources"><meta property="og:description" content="This is a list of some causal inference resources, which I update from time to time. You can also check out my posts on causal inference and A/B testing.
 Books:
 Causal Inference: What if by Miguel Hernán and Jamie Robins: The most practical book I&rsquo;ve read. Highly recommended. Trustworthy Online Controlled Experiments : A Practical Guide to A/B Testing by Ron Kohavi, Diane Tang, and Ya Xu: Building on the authors&rsquo; decades of industry experience, this is pretty much the bible of online experiments, which is how causal inference is often done in practice."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/causal-inference-resources/"><meta property="article:section" content><meta property="article:modified_time" content="2023-07-06T16:01:57+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Causal inference resources"><meta name=twitter:description content="This is a list of some causal inference resources, which I update from time to time. You can also check out my posts on causal inference and A/B testing.
 Books:
diff --git a/consult/index.html b/consult/index.html
index 64ed12a78..afdf64b5d 100644
--- a/consult/index.html
+++ b/consult/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Helping climate & nature tech startups ship data-intensive solutions | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Consulting for climate & nature tech startups: Strategic advice, implementation of Data/AI/ML solutions, and hiring help by an experienced tech leader."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/consult/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/consult/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Helping climate & nature tech startups ship data-intensive solutions"><meta property="og:description" content="Consulting for climate & nature tech startups: Strategic advice, implementation of Data/AI/ML solutions, and hiring help by an experienced tech leader."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/consult/"><meta property="og:image" content="https://yanirseroussi.com/yanir-seroussi-startup-data-and-ai-consultant-banner.webp"><meta property="article:section" content><meta property="article:modified_time" content="2024-06-26T15:02:31+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/yanir-seroussi-startup-data-and-ai-consultant-banner.webp"><meta name=twitter:title content="Helping climate & nature tech startups ship data-intensive solutions"><meta name=twitter:description content="Consulting for climate & nature tech startups: Strategic advice, implementation of Data/AI/ML solutions, and hiring help by an experienced tech leader."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Helping climate \u0026 nature tech startups ship data-intensive solutions","item":"https://yanirseroussi.com/consult/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Helping climate \u0026 nature tech startups ship data-intensive solutions","name":"Helping climate \u0026 nature tech startups ship data-intensive solutions","description":"Consulting for climate \u0026amp; nature tech startups: Strategic advice, implementation of Data/AI/ML solutions, and hiring help by an experienced tech leader.","keywords":[],"articleBody":"This is a high-level overview of the sort of problems I help with and how you can engage with me. I also sprinkled testimonials from my LinkedIn Recommendations section for good measure.\nInterested in working together? Book a free intro call or contact me to discuss.\nYanir has the rare qualities of being (a) very good with data and (b) very able to express his opinions – especially when he doesn’t agree with how something is being done. His unwavering focus on the quality and efficiency of the development process has transformed the platform and team itself.\nWill Davies, CEO at Car Next Door (now Uber Carshare) Problems I help with Implementing Data/AI/ML projects Selecting Data/AI/ML tools and platforms Engaging with vendors and agencies Reviewing your architecture and roadmaps Hiring your first Data/AI/ML professionals Interested in working together? Book a free intro call or contact me to discuss.\nYanir is that rare data/AI practitioner who combines deep experience in data science with breadth in software engineering and experience working with marketing and product teams. This vastly increases the success rate of data/AI projects that he’s involved with.\nMartin Remy, Head of Data at Automattic (WordPress.com) Modes of engagement Advisory calls Well-scoped short-term Data/AI/ML projects Fractional Chief Data Officer engagements We start with low-commitment engagements to assess fit and build mutual trust. If there’s scope for deeper work, I’ll embed with your team to help with hands-on delivery. My goal is to help you build in-house capabilities so you can keep growing independently.\nInterested in working together? Book a free intro call or contact me to discuss.\nYanir came on as an accomplished data scientist and threw himself into early stage R\u0026D project. He was able to quickly prototype to prove out early hypotheses, working as one often does in a start up - as a full-stack data guy: scientist, engineer and architect. His versatility extends beyond the fields of data science… as our CTO said: “Yanir practically s**** beautiful code”.\nChris Cooper, CEO at Orkestra Principles When approaching consulting engagements, I aim to follow these key principles:\nGetting to know you and the root of your business problems. Iteratively implementing pragmatic solutions. Saying what I’ll do, doing what I said, and communicating if anything changes. To ensure the best results, I say no to engagements where there isn’t a great fit. In such cases, I may connect you to someone from my network who is better placed to help you.\nInterested in working together? Book a free intro call or contact me to discuss.\nYanir and I spent years working very closely, we built out the entire Machine Learning and Experimentation function of Automattic / WordPress.com together. He is the single most talented person I’ve had the opportunity to work with in my nearly 15 years in Data/ML.\nHe’s an excellent engineer, a curious data scientist, an authoritative ML expert, and a confident \u0026 clear communicator. Yanir’s work isn’t driven by ego. He experiments with new tech but never gets distracted by the latest shiny objects. Instead, he’s dedicated to doing things the right way and choosing the right tools for the job, whether that’s a neural network-based time series regressor or a simple VLOOKUP.\nDemet Dagdelen, Director of ML and Experimentation at Automattic (WordPress.com) Bio and mission With over a decade of experience across various data and engineering roles, the main theme of my career has been bringing data-intensive applications to production. This has included anything from solving isolated data problems to building systems that serve millions of users. With a proven capability to work independently and in teams, lead and mentor colleagues, and communicate with both technical and non-technical stakeholders, my focus is always on delivering business value.\nMy experience and formal education fall under three key areas:\nsoftware engineering (15+ years; Computer Science BSc) data science / engineering (10+ years; Artificial Intelligence PhD) tech leadership (5+ years with startups and scaleups) 🐳\nMy current mission is to help climate and nature tech startups grow and scale successfully with Data/AI/ML.\n🐳 Interested in working together? Book a free intro call or contact me to discuss.\nIn addition to volunteering with Reef Life Survey for nearly a decade, Yanir has initiated and led several paid projects to build and maintain public-facing tools on the Reef Life Survey website. These tools are phenomenal and have significantly improved volunteer training, field work, funding agency partnerships, reputation, and public engagement and knowledge. Yanir’s proactivity, creativity and high quality of work have been invaluable to our organisation. I strongly recommend Yanir to anyone looking to implement data \u0026 AI projects.\nRick Stuart-Smith, President at Reef Life Survey There are a lot of data scientists in the world today. They range from advanced analysts to experienced mathematicians. Yanir sits at a unique nexus: he has a strong academic background with sound mathematical and statistical skills, he can build solutions end to end, and he has a relentless thirst for the truth. I’d recommend Yanir any day if you are looking for someone who can execute thoughtfully and leave no stone unturned. You can trust Yanir’s outputs in a world where it is very easy to accidentally (or due to laziness) let the data lie to those who need it most.\nLastly, Yanir is passionate and I’ve had direct experience that he will see a conversation through no matter the degree of disagreement. He’s the kind of person you want to disagree with because you know you’ll get the best outcome or agree to disagree and feel like you are still on a team together. In my opinion, this working style is my favorite brand of loyalty – the ability to hash it out, find the truth, build solutions to move the business forward.\nJennifer Wilson, Associate Principal at Google; formerly Director of Data \u0026 Analytics at Automattic (WordPress.com) ","wordCount":"974","inLanguage":"en","image":"https://yanirseroussi.com/yanir-seroussi-startup-data-and-ai-consultant-banner.webp","datePublished":"0001-01-01T00:00:00Z","dateModified":"2024-06-26T15:02:31+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/consult/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span class=active>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Helping climate & nature tech startups ship data-intensive solutions</h1><div class=post-meta></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/yanir-seroussi-startup-data-and-ai-consultant-banner.webp alt="Logo of Yanir Seroussi's consulting services, depicting a wave and an up-and-to-the-right graph next to his profile picture."></figure><div class=post-content><p>This is a high-level overview of the sort of problems I help with and how you can engage with me. I also sprinkled testimonials from <a href=https://www.linkedin.com/in/yanirseroussi/details/recommendations/ target=_blank rel=noopener>my LinkedIn Recommendations section</a> for good measure.</p><p><strong>Interested in working together?</strong> <a href=/free-intro-call/>Book a free intro call</a> or <a href=/contact/>contact me to discuss</a>.</p><hr><blockquote><p>Yanir has the rare qualities of being (a) very good with data and (b) very able to express his opinions &ndash; especially when he doesn&rsquo;t agree with how something is being done. His unwavering focus on the quality and efficiency of the development process has transformed the platform and team itself.</p><footer><strong></strong>
+<meta name=keywords content><meta name=description content="Consulting for climate & nature tech startups: Strategic advice, implementation of Data/AI/ML solutions, and hiring help by an experienced tech leader."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/consult/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/consult/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Helping climate & nature tech startups ship data-intensive solutions"><meta property="og:description" content="Consulting for climate & nature tech startups: Strategic advice, implementation of Data/AI/ML solutions, and hiring help by an experienced tech leader."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/consult/"><meta property="og:image" content="https://yanirseroussi.com/yanir-seroussi-startup-data-and-ai-consultant-banner.webp"><meta property="article:section" content><meta property="article:modified_time" content="2024-06-26T15:02:31+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/yanir-seroussi-startup-data-and-ai-consultant-banner.webp"><meta name=twitter:title content="Helping climate & nature tech startups ship data-intensive solutions"><meta name=twitter:description content="Consulting for climate & nature tech startups: Strategic advice, implementation of Data/AI/ML solutions, and hiring help by an experienced tech leader."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Helping climate \u0026 nature tech startups ship data-intensive solutions","item":"https://yanirseroussi.com/consult/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Helping climate \u0026 nature tech startups ship data-intensive solutions","name":"Helping climate \u0026 nature tech startups ship data-intensive solutions","description":"Consulting for climate \u0026amp; nature tech startups: Strategic advice, implementation of Data/AI/ML solutions, and hiring help by an experienced tech leader.","keywords":[],"articleBody":"This is a high-level overview of the sort of problems I help with and how you can engage with me. I also sprinkled testimonials from my LinkedIn Recommendations section for good measure.\nInterested in working together? Book a free intro call or contact me to discuss.\nYanir has the rare qualities of being (a) very good with data and (b) very able to express his opinions – especially when he doesn’t agree with how something is being done. His unwavering focus on the quality and efficiency of the development process has transformed the platform and team itself.\nWill Davies, CEO at Car Next Door (now Uber Carshare) Problems I help with Implementing Data/AI/ML projects Selecting Data/AI/ML tools and platforms Engaging with vendors and agencies Reviewing your architecture and roadmaps Hiring your first Data/AI/ML professionals Interested in working together? Book a free intro call or contact me to discuss.\nYanir is that rare data/AI practitioner who combines deep experience in data science with breadth in software engineering and experience working with marketing and product teams. This vastly increases the success rate of data/AI projects that he’s involved with.\nMartin Remy, Head of Data at Automattic (WordPress.com) Modes of engagement Advisory calls Well-scoped short-term Data/AI/ML projects Fractional Chief Data Officer engagements We start with low-commitment engagements to assess fit and build mutual trust. If there’s scope for deeper work, I’ll embed with your team to help with hands-on delivery. My goal is to help you build in-house capabilities so you can keep growing independently.\nInterested in working together? Book a free intro call or contact me to discuss.\nYanir came on as an accomplished data scientist and threw himself into early stage R\u0026D project. He was able to quickly prototype to prove out early hypotheses, working as one often does in a start up - as a full-stack data guy: scientist, engineer and architect. His versatility extends beyond the fields of data science… as our CTO said: “Yanir practically s**** beautiful code”.\nChris Cooper, CEO at Orkestra Principles When approaching consulting engagements, I aim to follow these key principles:\nGetting to know you and the root of your business problems. Iteratively implementing pragmatic solutions. Saying what I’ll do, doing what I said, and communicating if anything changes. To ensure the best results, I say no to engagements where there isn’t a great fit. In such cases, I may connect you to someone from my network who is better placed to help you.\nInterested in working together? Book a free intro call or contact me to discuss.\nYanir and I spent years working very closely, we built out the entire Machine Learning and Experimentation function of Automattic / WordPress.com together. He is the single most talented person I’ve had the opportunity to work with in my nearly 15 years in Data/ML.\nHe’s an excellent engineer, a curious data scientist, an authoritative ML expert, and a confident \u0026 clear communicator. Yanir’s work isn’t driven by ego. He experiments with new tech but never gets distracted by the latest shiny objects. Instead, he’s dedicated to doing things the right way and choosing the right tools for the job, whether that’s a neural network-based time series regressor or a simple VLOOKUP.\nDemet Dagdelen, Director of ML and Experimentation at Automattic (WordPress.com) Bio and mission With over a decade of experience across various data and engineering roles, the main theme of my career has been bringing data-intensive applications to production. This has included anything from solving isolated data problems to building systems that serve millions of users. With a proven capability to work independently and in teams, lead and mentor colleagues, and communicate with both technical and non-technical stakeholders, my focus is always on delivering business value.\nMy experience and formal education fall under three key areas:\nsoftware engineering (15+ years; Computer Science BSc) data science / engineering (10+ years; Artificial Intelligence PhD) tech leadership (5+ years with startups and scaleups) 🐳\nMy current mission is to help climate and nature tech startups grow and scale successfully with Data/AI/ML.\n🐳 Interested in working together? Book a free intro call or contact me to discuss.\nIn addition to volunteering with Reef Life Survey for nearly a decade, Yanir has initiated and led several paid projects to build and maintain public-facing tools on the Reef Life Survey website. These tools are phenomenal and have significantly improved volunteer training, field work, funding agency partnerships, reputation, and public engagement and knowledge. Yanir’s proactivity, creativity and high quality of work have been invaluable to our organisation. I strongly recommend Yanir to anyone looking to implement data \u0026 AI projects.\nRick Stuart-Smith, President at Reef Life Survey There are a lot of data scientists in the world today. They range from advanced analysts to experienced mathematicians. Yanir sits at a unique nexus: he has a strong academic background with sound mathematical and statistical skills, he can build solutions end to end, and he has a relentless thirst for the truth. I’d recommend Yanir any day if you are looking for someone who can execute thoughtfully and leave no stone unturned. You can trust Yanir’s outputs in a world where it is very easy to accidentally (or due to laziness) let the data lie to those who need it most.\nLastly, Yanir is passionate and I’ve had direct experience that he will see a conversation through no matter the degree of disagreement. He’s the kind of person you want to disagree with because you know you’ll get the best outcome or agree to disagree and feel like you are still on a team together. In my opinion, this working style is my favorite brand of loyalty – the ability to hash it out, find the truth, build solutions to move the business forward.\nJennifer Wilson, Associate Principal at Google; formerly Director of Data \u0026 Analytics at Automattic (WordPress.com) ","wordCount":"974","inLanguage":"en","image":"https://yanirseroussi.com/yanir-seroussi-startup-data-and-ai-consultant-banner.webp","datePublished":"0001-01-01T00:00:00Z","dateModified":"2024-06-26T15:02:31+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/consult/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span class=active>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Helping climate & nature tech startups ship data-intensive solutions</h1><div class=post-meta></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/yanir-seroussi-startup-data-and-ai-consultant-banner.webp alt="Logo of Yanir Seroussi's consulting services, depicting a wave and an up-and-to-the-right graph next to his profile picture."></figure><div class=post-content><p>This is a high-level overview of the sort of problems I help with and how you can engage with me. I also sprinkled testimonials from <a href=https://www.linkedin.com/in/yanirseroussi/details/recommendations/ target=_blank rel=noopener>my LinkedIn Recommendations section</a> for good measure.</p><p><strong>Interested in working together?</strong> <a href=/free-intro-call/>Book a free intro call</a> or <a href=/contact/>contact me to discuss</a>.</p><hr><blockquote><p>Yanir has the rare qualities of being (a) very good with data and (b) very able to express his opinions &ndash; especially when he doesn&rsquo;t agree with how something is being done. His unwavering focus on the quality and efficiency of the development process has transformed the platform and team itself.</p><footer><strong></strong>
 <cite><a href=https://www.linkedin.com/in/yanirseroussi/details/recommendations/ title=https://www.linkedin.com/in/yanirseroussi/details/recommendations/ target=_blank rel=noopener>Will Davies, CEO at Car Next Door (now Uber Carshare)</a></cite></footer></blockquote><hr><h2 id=problems-i-help-with>Problems I help with<a hidden class=anchor aria-hidden=true href=#problems-i-help-with>#</a></h2><ul><li>Implementing Data/AI/ML projects</li><li>Selecting Data/AI/ML tools and platforms</li><li>Engaging with vendors and agencies</li><li>Reviewing your architecture and roadmaps</li><li>Hiring your first Data/AI/ML professionals</li></ul><p><strong>Interested in working together?</strong> <a href=/free-intro-call/>Book a free intro call</a> or <a href=/contact/>contact me to discuss</a>.</p><hr><blockquote><p>Yanir is that rare data/AI practitioner who combines deep experience in data science with breadth in software engineering and experience working with marketing and product teams. This vastly increases the success rate of data/AI projects that he&rsquo;s involved with.</p><footer><strong></strong>
 <cite><a href=https://www.linkedin.com/in/yanirseroussi/details/recommendations/ title=https://www.linkedin.com/in/yanirseroussi/details/recommendations/ target=_blank rel=noopener>Martin Remy, Head of Data at Automattic (WordPress.com)</a></cite></footer></blockquote><hr><h2 id=modes-of-engagement>Modes of engagement<a hidden class=anchor aria-hidden=true href=#modes-of-engagement>#</a></h2><ul><li><a href=https://calendly.com/yanir-seroussi/data-to-ai-strategy-consultation target=_blank rel=noopener>Advisory calls</a></li><li>Well-scoped short-term Data/AI/ML projects</li><li><a href=/fractional-chief-data-officer/#/>Fractional Chief Data Officer engagements</a></li></ul><p>We start with low-commitment engagements to assess fit and build mutual trust. If there&rsquo;s scope for deeper work, I&rsquo;ll embed with your team to help with hands-on delivery. My goal is to help you build in-house capabilities so you can keep growing independently.</p><p><strong>Interested in working together?</strong> <a href=/free-intro-call/>Book a free intro call</a> or <a href=/contact/>contact me to discuss</a>.</p><hr><blockquote><p>Yanir came on as an accomplished data scientist and threw himself into early stage R&amp;D project. He was able to quickly prototype to prove out early hypotheses, working as one often does in a start up - as a full-stack data guy: scientist, engineer and architect. His versatility extends beyond the fields of data science&mldr; as our CTO said: &ldquo;Yanir practically s**** beautiful code&rdquo;.</p><footer><strong></strong>
 <cite><a href=https://www.linkedin.com/in/yanirseroussi/details/recommendations/ title=https://www.linkedin.com/in/yanirseroussi/details/recommendations/ target=_blank rel=noopener>Chris Cooper, CEO at Orkestra</a></cite></footer></blockquote><hr><h2 id=principles>Principles<a hidden class=anchor aria-hidden=true href=#principles>#</a></h2><p>When approaching consulting engagements, I aim to follow these key principles:</p><ul><li>Getting to know you and the root of your business problems.</li><li>Iteratively implementing pragmatic solutions.</li><li>Saying what I&rsquo;ll do, doing what I said, and communicating if anything changes.</li></ul><p>To ensure the best results, I say no to engagements where there isn&rsquo;t a great fit. In such cases, I may connect you to someone from my network who is better placed to help you.</p><p><strong>Interested in working together?</strong> <a href=/free-intro-call/>Book a free intro call</a> or <a href=/contact/>contact me to discuss</a>.</p><hr><blockquote><p><p>Yanir and I spent years working very closely, we built out the entire Machine Learning and Experimentation function of Automattic / WordPress.com together. He is the single most talented person I&rsquo;ve had the opportunity to work with in my nearly 15 years in Data/ML.</p><p>He&rsquo;s an excellent engineer, a curious data scientist, an authoritative ML expert, and a confident & clear communicator. Yanir&rsquo;s work isn&rsquo;t driven by ego. He experiments with new tech but never gets distracted by the latest shiny objects. Instead, he’s dedicated to doing things the right way and choosing the right tools for the job, whether that&rsquo;s a neural network-based time series regressor or a simple VLOOKUP.</p></p><footer><strong></strong>
diff --git a/contact/index.html b/contact/index.html
index e6f3e7fe4..d6a95f9d3 100644
--- a/contact/index.html
+++ b/contact/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Stay in touch | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Contact me or subscribe to the mailing list."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/contact/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/contact/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Stay in touch"><meta property="og:description" content="Contact me or subscribe to the mailing list."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/contact/"><meta property="og:image" content="https://yanirseroussi.com/yanir-seroussi-startup-data-and-ai-consultant-banner.webp"><meta property="article:section" content><meta property="article:modified_time" content="2024-05-23T15:31:11+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/yanir-seroussi-startup-data-and-ai-consultant-banner.webp"><meta name=twitter:title content="Stay in touch"><meta name=twitter:description content="Contact me or subscribe to the mailing list."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Stay in touch","item":"https://yanirseroussi.com/contact/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Stay in touch","name":"Stay in touch","description":"Contact me or subscribe to the mailing list.","keywords":[],"articleBody":"Contact me Feel free to contact me about topics discussed on this website, potential work, or anything else you think I’d find interesting. Contact options:\nSign up for a free fifteen-minute intro call. Book a one-hour Data-to-AI Strategy Consultation. Open a GitHub issue if you spotted a problem with this website. Connect on LinkedIn – please include a note on how you found me. Email me directly. Subscribe to my mailing list To get new posts delivered to your mailbox, enter your email address below and tap Subscribe. You may unsubscribe at any time.\nGet weekly posts in your mailbox Subscribe Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time. ","wordCount":"113","inLanguage":"en","image":"https://yanirseroussi.com/yanir-seroussi-startup-data-and-ai-consultant-banner.webp","datePublished":"0001-01-01T00:00:00Z","dateModified":"2024-05-23T15:31:11+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/contact/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Stay in touch</h1><div class=post-meta></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/yanir-seroussi-startup-data-and-ai-consultant-banner.webp alt="Logo of Yanir Seroussi's consulting services, depicting a wave and an up-and-to-the-right graph next to his profile picture."></figure><div class=post-content><h2 id=contact-me>Contact me<a hidden class=anchor aria-hidden=true href=#contact-me>#</a></h2><p>Feel free to contact me about topics discussed on this website, potential work, or anything else you think I&rsquo;d find interesting. Contact options:</p><ul><li><a href=/free-intro-call/>Sign up for a free fifteen-minute intro call</a>.</li><li><a href=https://calendly.com/yanir-seroussi/data-to-ai-strategy-consultation target=_blank rel=noopener>Book a one-hour Data-to-AI Strategy Consultation</a>.</li><li><a href=https://github.com/yanirs/yanirseroussi.com/issues target=_blank rel=noopener>Open a GitHub issue</a> if you spotted a problem with this website.</li><li><a href=https://www.linkedin.com/in/yanirseroussi target=_blank rel=noopener>Connect on LinkedIn</a> – please include a note on how you found me.</li><li><a href=mailto:contact@yanirseroussi.com>Email me directly</a>.</li></ul><h2 id=subscribe-to-my-mailing-list>Subscribe to my mailing list<a hidden class=anchor aria-hidden=true href=#subscribe-to-my-mailing-list>#</a></h2><p>To get new posts delivered to your mailbox, enter your email address below and tap Subscribe. You may unsubscribe at any time.</p><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
+<meta name=keywords content><meta name=description content="Contact me or subscribe to the mailing list."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/contact/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/contact/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Stay in touch"><meta property="og:description" content="Contact me or subscribe to the mailing list."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/contact/"><meta property="og:image" content="https://yanirseroussi.com/yanir-seroussi-startup-data-and-ai-consultant-banner.webp"><meta property="article:section" content><meta property="article:modified_time" content="2024-05-23T15:31:11+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/yanir-seroussi-startup-data-and-ai-consultant-banner.webp"><meta name=twitter:title content="Stay in touch"><meta name=twitter:description content="Contact me or subscribe to the mailing list."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Stay in touch","item":"https://yanirseroussi.com/contact/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Stay in touch","name":"Stay in touch","description":"Contact me or subscribe to the mailing list.","keywords":[],"articleBody":"Contact me Feel free to contact me about topics discussed on this website, potential work, or anything else you think I’d find interesting. Contact options:\nSign up for a free fifteen-minute intro call. Book a one-hour Data-to-AI Strategy Consultation. Open a GitHub issue if you spotted a problem with this website. Connect on LinkedIn – please include a note on how you found me. Email me directly. Subscribe to my mailing list To get new posts delivered to your mailbox, enter your email address below and tap Subscribe. You may unsubscribe at any time.\nGet weekly posts in your mailbox Subscribe Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time. ","wordCount":"113","inLanguage":"en","image":"https://yanirseroussi.com/yanir-seroussi-startup-data-and-ai-consultant-banner.webp","datePublished":"0001-01-01T00:00:00Z","dateModified":"2024-05-23T15:31:11+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/contact/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Stay in touch</h1><div class=post-meta></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/yanir-seroussi-startup-data-and-ai-consultant-banner.webp alt="Logo of Yanir Seroussi's consulting services, depicting a wave and an up-and-to-the-right graph next to his profile picture."></figure><div class=post-content><h2 id=contact-me>Contact me<a hidden class=anchor aria-hidden=true href=#contact-me>#</a></h2><p>Feel free to contact me about topics discussed on this website, potential work, or anything else you think I&rsquo;d find interesting. Contact options:</p><ul><li><a href=/free-intro-call/>Sign up for a free fifteen-minute intro call</a>.</li><li><a href=https://calendly.com/yanir-seroussi/data-to-ai-strategy-consultation target=_blank rel=noopener>Book a one-hour Data-to-AI Strategy Consultation</a>.</li><li><a href=https://github.com/yanirs/yanirseroussi.com/issues target=_blank rel=noopener>Open a GitHub issue</a> if you spotted a problem with this website.</li><li><a href=https://www.linkedin.com/in/yanirseroussi target=_blank rel=noopener>Connect on LinkedIn</a> – please include a note on how you found me.</li><li><a href=mailto:contact@yanirseroussi.com>Email me directly</a>.</li></ul><h2 id=subscribe-to-my-mailing-list>Subscribe to my mailing list<a hidden class=anchor aria-hidden=true href=#subscribe-to-my-mailing-list>#</a></h2><p>To get new posts delivered to your mailbox, enter your email address below and tap Subscribe. You may unsubscribe at any time.</p><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div></div><footer class=post-footer><ul class=post-tags></ul></footer></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
diff --git a/data-to-ai-health-check/index.html b/data-to-ai-health-check/index.html
index 9f5f33470..5c87f4175 100644
--- a/data-to-ai-health-check/index.html
+++ b/data-to-ai-health-check/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Free Guide: Data-to-AI Health Check for Startups | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Download a free PDF guide that helps you assess a startup&rsquo;s Data-to-AI health by probing eight key areas."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/data-to-ai-health-check/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/data-to-ai-health-check/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Free Guide: Data-to-AI Health Check for Startups"><meta property="og:description" content="Download a free PDF guide that helps you assess a startup&rsquo;s Data-to-AI health by probing eight key areas."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/data-to-ai-health-check/"><meta property="og:image" content="https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page.webp"><meta property="article:section" content><meta property="article:modified_time" content="2024-06-26T12:57:51+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page.webp"><meta name=twitter:title content="Free Guide: Data-to-AI Health Check for Startups"><meta name=twitter:description content="Download a free PDF guide that helps you assess a startup&rsquo;s Data-to-AI health by probing eight key areas."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Free Guide: Data-to-AI Health Check for Startups","item":"https://yanirseroussi.com/data-to-ai-health-check/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Free Guide: Data-to-AI Health Check for Startups","name":"Free Guide: Data-to-AI Health Check for Startups","description":"Download a free PDF guide that helps you assess a startup\u0026rsquo;s Data-to-AI health by probing eight key areas.","keywords":[],"articleBody":"Are you…\n…a startup leader considering your first AI/ML project? …a data professional thinking of joining a startup? …an investor assessing the viability of a startup’s AI/ML plans? …a consultant helping startups ship data-intensive solutions? If the answer is yes to any of the above, download my free Data-to-AI Health Check for Startups PDF to help guide your next steps.\nStartup failure rates are high.\nData/AI/ML projects often fail to ship.\nIncrease your chances of success with the Data-to-AI Health Check for Startups.\nGet your copy by email today Submit Early feedback from startup gurus \"A symptom of hard-won personal experience ... a solid list.\"\n\"Excellent advice for anybody looking to join an early stage startup.\"\n\"Great questions to ask any entrepreneur!\" In the guide, you’ll find a set of questions to help you assess a startup’s data/AI/ML health, along with the business context in which data/AI/ML projects get shipped. The questions and scoring guidance cover eight areas, resulting in the scorecard shown below.\nData-to-AI Health Check: Scorecard example ","wordCount":"169","inLanguage":"en","image":"https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page.webp","datePublished":"0001-01-01T00:00:00Z","dateModified":"2024-06-26T12:57:51+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/data-to-ai-health-check/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Free Guide: Data-to-AI Health Check for Startups</h1><div class=post-meta></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page_hud3e090681fd9b79f06e54ee6dc5d7b65_28476_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page_hud3e090681fd9b79f06e54ee6dc5d7b65_28476_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page_hud3e090681fd9b79f06e54ee6dc5d7b65_28476_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page_hud3e090681fd9b79f06e54ee6dc5d7b65_28476_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page.webp alt="Cover page of the free guide: Data-to-AI Health Check for Startups." width=1200 height=630></figure><div class=post-content><p>Are you&mldr;</p><ul><li>&mldr;a startup leader considering your first AI/ML project?</li><li>&mldr;a data professional thinking of joining a startup?</li><li>&mldr;an investor assessing the viability of a startup&rsquo;s AI/ML plans?</li><li>&mldr;a consultant helping startups ship data-intensive solutions?</li></ul><p>If the answer is yes to any of the above, download my free <em>Data-to-AI Health Check for Startups</em> PDF to help guide your next steps.</p><p>Startup failure rates are high.<br>Data/AI/ML projects often fail to ship.<br>Increase your chances of success with the <em>Data-to-AI Health Check for Startups</em>.</p><style>.mailing-list-container{padding:10px}</style><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6554492/subscriptions method=post data-sv-form=6554492 data-uid=26c0fa1a04 data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to get your PDF guide."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get your copy by email today</label>
+<meta name=keywords content><meta name=description content="Download a free PDF guide that helps you assess a startup&rsquo;s Data-to-AI health by probing eight key areas."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/data-to-ai-health-check/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/data-to-ai-health-check/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Free Guide: Data-to-AI Health Check for Startups"><meta property="og:description" content="Download a free PDF guide that helps you assess a startup&rsquo;s Data-to-AI health by probing eight key areas."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/data-to-ai-health-check/"><meta property="og:image" content="https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page.webp"><meta property="article:section" content><meta property="article:modified_time" content="2024-06-26T12:57:51+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page.webp"><meta name=twitter:title content="Free Guide: Data-to-AI Health Check for Startups"><meta name=twitter:description content="Download a free PDF guide that helps you assess a startup&rsquo;s Data-to-AI health by probing eight key areas."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Free Guide: Data-to-AI Health Check for Startups","item":"https://yanirseroussi.com/data-to-ai-health-check/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Free Guide: Data-to-AI Health Check for Startups","name":"Free Guide: Data-to-AI Health Check for Startups","description":"Download a free PDF guide that helps you assess a startup\u0026rsquo;s Data-to-AI health by probing eight key areas.","keywords":[],"articleBody":"Are you…\n…a startup leader considering your first AI/ML project? …a data professional thinking of joining a startup? …an investor assessing the viability of a startup’s AI/ML plans? …a consultant helping startups ship data-intensive solutions? If the answer is yes to any of the above, download my free Data-to-AI Health Check for Startups PDF to help guide your next steps.\nStartup failure rates are high.\nData/AI/ML projects often fail to ship.\nIncrease your chances of success with the Data-to-AI Health Check for Startups.\nGet your copy by email today Submit Early feedback from startup gurus \"A symptom of hard-won personal experience ... a solid list.\"\n\"Excellent advice for anybody looking to join an early stage startup.\"\n\"Great questions to ask any entrepreneur!\" In the guide, you’ll find a set of questions to help you assess a startup’s data/AI/ML health, along with the business context in which data/AI/ML projects get shipped. The questions and scoring guidance cover eight areas, resulting in the scorecard shown below.\nData-to-AI Health Check: Scorecard example ","wordCount":"169","inLanguage":"en","image":"https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page.webp","datePublished":"0001-01-01T00:00:00Z","dateModified":"2024-06-26T12:57:51+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/data-to-ai-health-check/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Free Guide: Data-to-AI Health Check for Startups</h1><div class=post-meta></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page_hud3e090681fd9b79f06e54ee6dc5d7b65_28476_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page_hud3e090681fd9b79f06e54ee6dc5d7b65_28476_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page_hud3e090681fd9b79f06e54ee6dc5d7b65_28476_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page_hud3e090681fd9b79f06e54ee6dc5d7b65_28476_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/data-to-ai-health-check/data-to-ai-health-check-for-startups-cover-page.webp alt="Cover page of the free guide: Data-to-AI Health Check for Startups." width=1200 height=630></figure><div class=post-content><p>Are you&mldr;</p><ul><li>&mldr;a startup leader considering your first AI/ML project?</li><li>&mldr;a data professional thinking of joining a startup?</li><li>&mldr;an investor assessing the viability of a startup&rsquo;s AI/ML plans?</li><li>&mldr;a consultant helping startups ship data-intensive solutions?</li></ul><p>If the answer is yes to any of the above, download my free <em>Data-to-AI Health Check for Startups</em> PDF to help guide your next steps.</p><p>Startup failure rates are high.<br>Data/AI/ML projects often fail to ship.<br>Increase your chances of success with the <em>Data-to-AI Health Check for Startups</em>.</p><style>.mailing-list-container{padding:10px}</style><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6554492/subscriptions method=post data-sv-form=6554492 data-uid=26c0fa1a04 data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to get your PDF guide."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get your copy by email today</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email><fieldset data-group=checkboxes group=field type=Custom order=1 save_as=Tag style=display:none><div data-element=tags-checkboxes data-group=checkbox><input class=formkit-checkbox type=checkbox name=tags[] value=5001948 checked></div></fieldset><button data-element=submit>Submit</button></div></div></form></div><hr><p style=text-align:center><small><a href=https://www.linkedin.com/posts/yanirseroussi_if-you-join-a-startup-as-an-early-employee-activity-7193738878082564096-PxOS target=_blank>Early feedback from startup gurus</a></small></p><p style=text-align:center;font-style:italic;font-weight:600>"A symptom of hard-won personal experience ... a solid list."<br>"Excellent advice for anybody looking to join an early stage startup."<br>"Great questions to ask any entrepreneur!"</p><hr><p>In the guide, you&rsquo;ll find a set of questions to help you assess a startup&rsquo;s data/AI/ML health, along with the business context in which data/AI/ML projects get shipped.
 The questions and scoring guidance cover <a href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/>eight areas</a>, resulting in the scorecard shown below.</p><figure><a href=data-to-ai-health-check-for-startups-scorecard-example.webp target=_blank rel=noopener><img sizes="(min-width: 768px) 720px,
 100vw" srcset="https://yanirseroussi.com/data-to-ai-health-check/_hu8f1ae4ce4fc104fe98d8ff9420b39ed1_175570_d4006ea88ed8ecf02121a23523989e2b.webp 360w,
diff --git a/deep-learning-resources/index.html b/deep-learning-resources/index.html
index a88294cca..5f8a90e42 100644
--- a/deep-learning-resources/index.html
+++ b/deep-learning-resources/index.html
@@ -1,6 +1,6 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Deep learning resources | Yanir Seroussi | Data & AI for Startup Impact</title>
 <meta name=keywords content><meta name=description content="This page summarises the deep learning resources I&rsquo;ve consulted in my album cover classification project.
-Tutorials and blog posts Convolutional Neural Networks for Visual Recognition Stanford course notes: an excellent resource, very up-to-date and useful, despite still being a work in progress DeepLearning.net&rsquo;s Theano-based tutorials: not as up-to-date as the Stanford course notes, but still a good introduction to some of the theory and general Theano usage Lasagne&rsquo;s documentation and tutorials: still a bit lacking, but good when you know what you&rsquo;re looking for lasagne4newbs: Lasagne&rsquo;s convnet example with richer comments Using convolutional neural nets to detect facial keypoints tutorial: the resource that made me want to use Lasagne Classifying plankton with deep neural networks: an epic post, which I found while looking for Lasagne examples Various Wikipedia pages: a bit disappointing – the above resources are much better Papers Adam: a method for stochastic optimization (Kingma and Ba, 2015): an improvement over SGD with Nesterov momentum, AdaGrad and RMSProp, which I found to be useful in practice Algorithms for Hyper-Parameter Optimization (Bergstra et al."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/deep-learning-resources/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/deep-learning-resources/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Deep learning resources"><meta property="og:description" content="This page summarises the deep learning resources I&rsquo;ve consulted in my album cover classification project.
+Tutorials and blog posts Convolutional Neural Networks for Visual Recognition Stanford course notes: an excellent resource, very up-to-date and useful, despite still being a work in progress DeepLearning.net&rsquo;s Theano-based tutorials: not as up-to-date as the Stanford course notes, but still a good introduction to some of the theory and general Theano usage Lasagne&rsquo;s documentation and tutorials: still a bit lacking, but good when you know what you&rsquo;re looking for lasagne4newbs: Lasagne&rsquo;s convnet example with richer comments Using convolutional neural nets to detect facial keypoints tutorial: the resource that made me want to use Lasagne Classifying plankton with deep neural networks: an epic post, which I found while looking for Lasagne examples Various Wikipedia pages: a bit disappointing – the above resources are much better Papers Adam: a method for stochastic optimization (Kingma and Ba, 2015): an improvement over SGD with Nesterov momentum, AdaGrad and RMSProp, which I found to be useful in practice Algorithms for Hyper-Parameter Optimization (Bergstra et al."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/deep-learning-resources/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/deep-learning-resources/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Deep learning resources"><meta property="og:description" content="This page summarises the deep learning resources I&rsquo;ve consulted in my album cover classification project.
 Tutorials and blog posts Convolutional Neural Networks for Visual Recognition Stanford course notes: an excellent resource, very up-to-date and useful, despite still being a work in progress DeepLearning.net&rsquo;s Theano-based tutorials: not as up-to-date as the Stanford course notes, but still a good introduction to some of the theory and general Theano usage Lasagne&rsquo;s documentation and tutorials: still a bit lacking, but good when you know what you&rsquo;re looking for lasagne4newbs: Lasagne&rsquo;s convnet example with richer comments Using convolutional neural nets to detect facial keypoints tutorial: the resource that made me want to use Lasagne Classifying plankton with deep neural networks: an epic post, which I found while looking for Lasagne examples Various Wikipedia pages: a bit disappointing – the above resources are much better Papers Adam: a method for stochastic optimization (Kingma and Ba, 2015): an improvement over SGD with Nesterov momentum, AdaGrad and RMSProp, which I found to be useful in practice Algorithms for Hyper-Parameter Optimization (Bergstra et al."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/deep-learning-resources/"><meta property="article:section" content><meta property="article:published_time" content="2015-07-06T00:38:44+00:00"><meta property="article:modified_time" content="2021-11-09T15:38:25+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Deep learning resources"><meta name=twitter:description content="This page summarises the deep learning resources I&rsquo;ve consulted in my album cover classification project.
 Tutorials and blog posts Convolutional Neural Networks for Visual Recognition Stanford course notes: an excellent resource, very up-to-date and useful, despite still being a work in progress DeepLearning.net&rsquo;s Theano-based tutorials: not as up-to-date as the Stanford course notes, but still a good introduction to some of the theory and general Theano usage Lasagne&rsquo;s documentation and tutorials: still a bit lacking, but good when you know what you&rsquo;re looking for lasagne4newbs: Lasagne&rsquo;s convnet example with richer comments Using convolutional neural nets to detect facial keypoints tutorial: the resource that made me want to use Lasagne Classifying plankton with deep neural networks: an epic post, which I found while looking for Lasagne examples Various Wikipedia pages: a bit disappointing – the above resources are much better Papers Adam: a method for stochastic optimization (Kingma and Ba, 2015): an improvement over SGD with Nesterov momentum, AdaGrad and RMSProp, which I found to be useful in practice Algorithms for Hyper-Parameter Optimization (Bergstra et al."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Deep learning resources","item":"https://yanirseroussi.com/deep-learning-resources/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Deep learning resources","name":"Deep learning resources","description":"This page summarises the deep learning resources I\u0026rsquo;ve consulted in my album cover classification project.\nTutorials and blog posts Convolutional Neural Networks for Visual Recognition Stanford course notes: an excellent resource, very up-to-date and useful, despite still being a work in progress DeepLearning.net\u0026rsquo;s Theano-based tutorials: not as up-to-date as the Stanford course notes, but still a good introduction to some of the theory and general Theano usage Lasagne\u0026rsquo;s documentation and tutorials: still a bit lacking, but good when you know what you\u0026rsquo;re looking for lasagne4newbs: Lasagne\u0026rsquo;s convnet example with richer comments Using convolutional neural nets to detect facial keypoints tutorial: the resource that made me want to use Lasagne Classifying plankton with deep neural networks: an epic post, which I found while looking for Lasagne examples Various Wikipedia pages: a bit disappointing – the above resources are much better Papers Adam: a method for stochastic optimization (Kingma and Ba, 2015): an improvement over SGD with Nesterov momentum, AdaGrad and RMSProp, which I found to be useful in practice Algorithms for Hyper-Parameter Optimization (Bergstra et al.","keywords":[],"articleBody":"This page summarises the deep learning resources I’ve consulted in my album cover classification project.\nTutorials and blog posts Convolutional Neural Networks for Visual Recognition Stanford course notes: an excellent resource, very up-to-date and useful, despite still being a work in progress DeepLearning.net’s Theano-based tutorials: not as up-to-date as the Stanford course notes, but still a good introduction to some of the theory and general Theano usage Lasagne’s documentation and tutorials: still a bit lacking, but good when you know what you’re looking for lasagne4newbs: Lasagne’s convnet example with richer comments Using convolutional neural nets to detect facial keypoints tutorial: the resource that made me want to use Lasagne Classifying plankton with deep neural networks: an epic post, which I found while looking for Lasagne examples Various Wikipedia pages: a bit disappointing – the above resources are much better Papers Adam: a method for stochastic optimization (Kingma and Ba, 2015): an improvement over SGD with Nesterov momentum, AdaGrad and RMSProp, which I found to be useful in practice Algorithms for Hyper-Parameter Optimization (Bergstra et al., 2011): the work behind Hyperopt – pretty useful stuff, not only for deep learning Convolutional Neural Networks at Constrained Time Cost (He and Sun, 2014): interesting experimental work on the tradeoffs between number of filters, filter sizes, and depth – deeper is better (but with diminishing returns); smaller filter sizes are better; delayed subsampling and spatial pyramid pooling are helpful Deep Learning in Neural Networks: An Overview (Schmidhuber, 2014): 88 pages and 888 references (35 content pages) – good for finding references, but a bit hard to follow; not so good for understanding how the various methods work and how to use or implement them Going deeper with convolutions (Szegedy et al., 2014): the GoogLeNet paper – interesting and compelling results, especially given the improvement in performance while reducing computational complexity ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky et al., 2012): the classic paper that arguably started (or significantly boosted) the recent buzz around deep learning – many interesting ideas; fairly accesible On the importance of initialization and momentum in deep learning (Sutskever et al., 2013): applying Nesterov momentum to deep learning – good read, simple concept, interesting results Random Search for Hyper-Parameter Optimization (Bergstra and Bengio, 2012): very compelling reasoning and experiments showing that random search outperforms grid search in many cases Recognizing Image Style (Karayev et al., 2014): identifying image style, which is similar to album genre – found that using models pretrained on ImageNet yielded the best results in some cases Very deep convolutional networks for large scale image recognition (Simonyan and Zisserman, 2014): VGGNet paper – interesting experiments and architectures – deep and homogeneous Visualizing and Understanding Convolutional Networks (Zeiler and Fergus, 2013): interesting work on visualisation, but I’ll need to apply it to understand it better ","wordCount":"467","inLanguage":"en","datePublished":"2015-07-06T00:38:44Z","dateModified":"2021-11-09T15:38:25+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/deep-learning-resources/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Deep learning resources</h1><div class=post-meta><span title='2015-07-06 00:38:44 +0000 UTC'>July 6, 2015</span></div></header><div class=post-content><p>This page summarises the deep learning resources I&rsquo;ve consulted in <a href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/>my album cover classification project</a>.</p><h3 id=tutorials-and-blog-posts>Tutorials and blog posts<a hidden class=anchor aria-hidden=true href=#tutorials-and-blog-posts>#</a></h3><ul><li><a href=http://cs231n.github.io/ target=_blank rel=noopener>Convolutional Neural Networks for Visual Recognition Stanford course notes</a>: an excellent resource, very up-to-date and useful, despite still being a work in progress</li><li><a href=http://deeplearning.net/tutorial/ target=_blank rel=noopener>DeepLearning.net&rsquo;s Theano-based tutorials</a>: not as up-to-date as the Stanford course notes, but still a good introduction to some of the theory and general Theano usage</li><li><a href=http://lasagne.readthedocs.org/en/latest/ target=_blank rel=noopener>Lasagne&rsquo;s documentation and tutorials</a>: still a bit lacking, but good when you know what you&rsquo;re looking for</li><li><a href=https://github.com/enlitic/lasagne4newbs target=_blank rel=noopener>lasagne4newbs</a>: Lasagne&rsquo;s convnet example with richer comments</li><li><a href=http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/ target=_blank rel=noopener>Using convolutional neural nets to detect facial keypoints tutorial</a>: the resource that made me want to use Lasagne</li><li><a href=http://benanne.github.io/2015/03/17/plankton.html target=_blank rel=noopener>Classifying plankton with deep neural networks</a>: an epic post, which I found while looking for Lasagne examples</li><li><a href=https://en.wikipedia.org/wiki/Main_Page target=_blank rel=noopener>Various Wikipedia pages</a>: a bit disappointing – the above resources are much better</li></ul><h3 id=papers>Papers<a hidden class=anchor aria-hidden=true href=#papers>#</a></h3><ul><li><a href=http://arxiv.org/abs/1412.6980 target=_blank rel=noopener>Adam: a method for stochastic optimization (Kingma and Ba, 2015)</a>: an improvement over SGD with Nesterov momentum, AdaGrad and RMSProp, which I found to be useful in practice</li><li><a href=http://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization target=_blank rel=noopener>Algorithms for Hyper-Parameter Optimization (Bergstra et al., 2011)</a>: the work behind <a href=https://github.com/hyperopt/hyperopt target=_blank rel=noopener>Hyperopt</a> – pretty useful stuff, not only for deep learning</li><li><a href=http://arxiv.org/abs/1412.1710 target=_blank rel=noopener>Convolutional Neural Networks at Constrained Time Cost (He and Sun, 2014)</a>: interesting experimental work on the tradeoffs between number of filters, filter sizes, and depth – deeper is better (but with diminishing returns); smaller filter sizes are better; delayed subsampling and spatial pyramid pooling are helpful</li><li><a href=http://arxiv.org/abs/1404.7828 target=_blank rel=noopener>Deep Learning in Neural Networks: An Overview (Schmidhuber, 2014)</a>: 88 pages and 888 references (35 content pages) – good for finding references, but a bit hard to follow; not so good for understanding how the various methods work and how to use or implement them</li><li><a href=http://arxiv.org/abs/1409.4842 target=_blank rel=noopener>Going deeper with convolutions (Szegedy et al., 2014)</a>: the GoogLeNet paper – interesting and compelling results, especially given the improvement in performance while reducing computational complexity</li><li><a href=http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks target=_blank rel=noopener>ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky et al., 2012)</a>: the classic paper that arguably started (or significantly boosted) the recent buzz around deep learning – many interesting ideas; fairly accesible</li><li><a href=http://www.cs.toronto.edu/~gdahl/papers/momentumNesterovDeepLearning.pdf target=_blank rel=noopener>On the importance of initialization and momentum in deep learning (Sutskever et al., 2013)</a>: applying Nesterov momentum to deep learning – good read, simple concept, interesting results</li><li><a href=http://jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf target=_blank rel=noopener>Random Search for Hyper-Parameter Optimization (Bergstra and Bengio, 2012)</a>: very compelling reasoning and experiments showing that random search outperforms grid search in many cases</li><li><a href=http://sergeykarayev.com/files/1311.3715v3.pdf target=_blank rel=noopener>Recognizing Image Style (Karayev et al., 2014)</a>: identifying image style, which is similar to album genre – found that using models pretrained on ImageNet yielded the best results in some cases</li><li><a href=http://arxiv.org/abs/1409.1556 target=_blank rel=noopener>Very deep convolutional networks for large scale image recognition (Simonyan and Zisserman, 2014)</a>: VGGNet paper – interesting experiments and architectures – deep and homogeneous</li><li><a href=http://arxiv.org/abs/1311.2901 target=_blank rel=noopener>Visualizing and Understanding Convolutional Networks (Zeiler and Fergus, 2013)</a>: interesting work on visualisation, but I&rsquo;ll need to apply it to understand it better</li></ul></div><footer class=post-footer><ul class=post-tags></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Deep learning resources on x" href="https://x.com/intent/tweet/?text=Deep%20learning%20resources&amp;url=https%3a%2f%2fyanirseroussi.com%2fdeep-learning-resources%2f&amp;hashtags="><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Deep learning resources on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2fdeep-learning-resources%2f&amp;title=Deep%20learning%20resources&amp;summary=Deep%20learning%20resources&amp;source=https%3a%2f%2fyanirseroussi.com%2fdeep-learning-resources%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Deep learning resources on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2fdeep-learning-resources%2f&title=Deep%20learning%20resources"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Deep learning resources on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2fdeep-learning-resources%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Deep learning resources on whatsapp" href="https://api.whatsapp.com/send?text=Deep%20learning%20resources%20-%20https%3a%2f%2fyanirseroussi.com%2fdeep-learning-resources%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Deep learning resources on telegram" href="https://telegram.me/share/url?text=Deep%20learning%20resources&amp;url=https%3a%2f%2fyanirseroussi.com%2fdeep-learning-resources%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Deep learning resources on ycombinator" href="https://news.ycombinator.com/submitlink?t=Deep%20learning%20resources&u=https%3a%2f%2fyanirseroussi.com%2fdeep-learning-resources%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
diff --git a/free-intro-call/index.html b/free-intro-call/index.html
index a55ac71d2..9a6e478a8 100644
--- a/free-intro-call/index.html
+++ b/free-intro-call/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Book a free fifteen-minute call | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Booking form for a quick intro call with Yanir Seroussi."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/free-intro-call/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/free-intro-call/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Book a free fifteen-minute call"><meta property="og:description" content="Booking form for a quick intro call with Yanir Seroussi."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/free-intro-call/"><meta property="og:image" content="https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call.webp"><meta property="article:section" content><meta property="article:modified_time" content="2024-06-26T12:57:51+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call.webp"><meta name=twitter:title content="Book a free fifteen-minute call"><meta name=twitter:description content="Booking form for a quick intro call with Yanir Seroussi."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Book a free fifteen-minute call","item":"https://yanirseroussi.com/free-intro-call/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Book a free fifteen-minute call","name":"Book a free fifteen-minute call","description":"Booking form for a quick intro call with Yanir Seroussi.","keywords":[],"articleBody":"Hello! 🐳\nSo… you’ve been poking around my website and LinkedIn profile, and you think we’d benefit from a quick intro call.\nWell, you’ve come to the right place!\nBut first, note that there are other ways to contact me that may be more effective.\nSecond, I also offer longer one-hour advisory calls. You get to set the agenda for such calls, but the focus there is typically on your Data-to-AI Strategy.\nThird, nothing is truly free. Filling the form below will subscribe you to my weekly mailing list. If you made it here, you probably want that anyway (you can always unsubscribe).\nOnce you submit the form, you’ll receive a confirmation email with the booking link for the call.\nRegister for a free fifteen-minute call Register ","wordCount":"127","inLanguage":"en","image":"https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call.webp","datePublished":"0001-01-01T00:00:00Z","dateModified":"2024-06-26T12:57:51+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/free-intro-call/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Book a free fifteen-minute call</h1><div class=post-meta></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call_hub0662ac613f149ce6567b8864db13ac7_22906_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call_hub0662ac613f149ce6567b8864db13ac7_22906_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call_hub0662ac613f149ce6567b8864db13ac7_22906_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call_hub0662ac613f149ce6567b8864db13ac7_22906_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call.webp alt="Yanir Seroussi having a call with a mystery person" width=1200 height=630></figure><div class=post-content><p>Hello! 🐳</p><p>So&mldr; you&rsquo;ve been poking around my website and <a href=https://www.linkedin.com/in/yanirseroussi/ target=_blank rel=noopener>LinkedIn profile</a>, and you think we&rsquo;d benefit from a quick intro call.</p><p>Well, you&rsquo;ve come to the right place!</p><p>But first, note that there are <a href=/contact/>other ways to contact me</a> that may be more effective.</p><p>Second, I also offer <a href=https://calendly.com/yanir-seroussi/data-to-ai-strategy-consultation target=_blank rel=noopener>longer one-hour advisory calls</a>. You get to set the agenda for such calls, but the focus there is typically on your Data-to-AI Strategy.</p><p>Third, nothing is truly free. Filling the form below will subscribe you to my weekly mailing list. If you made it here, you probably want that anyway (you can always unsubscribe).</p><p>Once you submit the form, you&rsquo;ll receive a confirmation email with the booking link for the call.</p><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6599419/subscriptions method=post data-sv-form=6599419 data-uid=ecc59764d0 data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email for booking details."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Register for a free fifteen-minute call</label>
+<meta name=keywords content><meta name=description content="Booking form for a quick intro call with Yanir Seroussi."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/free-intro-call/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/free-intro-call/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Book a free fifteen-minute call"><meta property="og:description" content="Booking form for a quick intro call with Yanir Seroussi."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/free-intro-call/"><meta property="og:image" content="https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call.webp"><meta property="article:section" content><meta property="article:modified_time" content="2024-06-26T12:57:51+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call.webp"><meta name=twitter:title content="Book a free fifteen-minute call"><meta name=twitter:description content="Booking form for a quick intro call with Yanir Seroussi."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Book a free fifteen-minute call","item":"https://yanirseroussi.com/free-intro-call/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Book a free fifteen-minute call","name":"Book a free fifteen-minute call","description":"Booking form for a quick intro call with Yanir Seroussi.","keywords":[],"articleBody":"Hello! 🐳\nSo… you’ve been poking around my website and LinkedIn profile, and you think we’d benefit from a quick intro call.\nWell, you’ve come to the right place!\nBut first, note that there are other ways to contact me that may be more effective.\nSecond, I also offer longer one-hour advisory calls. You get to set the agenda for such calls, but the focus there is typically on your Data-to-AI Strategy.\nThird, nothing is truly free. Filling the form below will subscribe you to my weekly mailing list. If you made it here, you probably want that anyway (you can always unsubscribe).\nOnce you submit the form, you’ll receive a confirmation email with the booking link for the call.\nRegister for a free fifteen-minute call Register ","wordCount":"127","inLanguage":"en","image":"https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call.webp","datePublished":"0001-01-01T00:00:00Z","dateModified":"2024-06-26T12:57:51+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/free-intro-call/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Book a free fifteen-minute call</h1><div class=post-meta></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call_hub0662ac613f149ce6567b8864db13ac7_22906_360x0_resize_q75_h2_box_2.webp 360w ,https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call_hub0662ac613f149ce6567b8864db13ac7_22906_480x0_resize_q75_h2_box_2.webp 480w ,https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call_hub0662ac613f149ce6567b8864db13ac7_22906_720x0_resize_q75_h2_box_2.webp 720w ,https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call_hub0662ac613f149ce6567b8864db13ac7_22906_1080x0_resize_q75_h2_box_2.webp 1080w ,https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call.webp 1200w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/free-intro-call/yanir-seroussi-intro-call.webp alt="Yanir Seroussi having a call with a mystery person" width=1200 height=630></figure><div class=post-content><p>Hello! 🐳</p><p>So&mldr; you&rsquo;ve been poking around my website and <a href=https://www.linkedin.com/in/yanirseroussi/ target=_blank rel=noopener>LinkedIn profile</a>, and you think we&rsquo;d benefit from a quick intro call.</p><p>Well, you&rsquo;ve come to the right place!</p><p>But first, note that there are <a href=/contact/>other ways to contact me</a> that may be more effective.</p><p>Second, I also offer <a href=https://calendly.com/yanir-seroussi/data-to-ai-strategy-consultation target=_blank rel=noopener>longer one-hour advisory calls</a>. You get to set the agenda for such calls, but the focus there is typically on your Data-to-AI Strategy.</p><p>Third, nothing is truly free. Filling the form below will subscribe you to my weekly mailing list. If you made it here, you probably want that anyway (you can always unsubscribe).</p><p>Once you submit the form, you&rsquo;ll receive a confirmation email with the booking link for the call.</p><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6599419/subscriptions method=post data-sv-form=6599419 data-uid=ecc59764d0 data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email for booking details."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Register for a free fifteen-minute call</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email><fieldset data-group=checkboxes group=field type=Custom order=1 save_as=Tag style=display:none><div data-element=tags-checkboxes data-group=checkbox><input class=formkit-checkbox type=checkbox name=tags[] value=5031249 checked></div></fieldset><button data-element=submit>Register</button></div></div></form></div></div><footer class=post-footer><ul class=post-tags></ul></footer></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/index.html b/index.html
index b9df0f5cb..316df15c9 100644
--- a/index.html
+++ b/index.html
@@ -1,8 +1,10 @@
 <!doctype html><html lang=en dir=auto><head><meta name=generator content="Hugo 0.127.0"><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Yanir Seroussi | Data & AI for Startup Impact"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Yanir Seroussi | Data & AI for Startup Impact"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><script type=application/ld+json>{"@context":"https://schema.org","@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","url":"https://yanirseroussi.com/","description":"Using advancements in data and artificial intelligence tech to make a positive impact on our world.\n","thumbnailUrl":"https://yanirseroussi.com/favicon.ico","sameAs":["https://www.linkedin.com/in/yanirseroussi/","https://github.com/yanirs/","https://scholar.google.com.au/citations?user=NR254LoAAAAJ","/contact/"]}</script></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><div class=profile><div class=profile_inner><img draggable=false src=https://yanirseroussi.com/home-profile.webp alt="Yanir Seroussi's profile picture" title="Yanir Seroussi's profile picture" height=250 width=278><h1>Yanir Seroussi<br>Startup Data & AI Consultant</h1><span><dl><dt><a href=/consult/>Consulting for</a>&mldr;</dt><dd>startups and scaleups focused on growing while making a positive impact</dd><dt><a href=/about/>Experienced in</a>&mldr;</dt><dd>software engineering (15+ years; Computer Science BSc)<br>data science / engineering (10+ years; Artificial Intelligence PhD)<br>tech leadership (5+ years with startups and scaleups)</dd><dt><a href=/posts/>Posting about</a>&mldr;</dt><dd>data and artificial intelligence, and their role in driving positive business impact</dd></dl></span><div class=social-icons><a href=https://www.linkedin.com/in/yanirseroussi/ target=_blank rel="noopener noreferrer me" title=LinkedIn><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M16 8a6 6 0 016 6v7h-4v-7a2 2 0 00-2-2 2 2 0 00-2 2v7h-4v-7a6 6 0 016-6z"/><rect x="2" y="9" width="4" height="12"/><circle cx="4" cy="4" r="2"/></svg>
+<meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Yanir Seroussi | Data & AI for Startup Impact"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Yanir Seroussi | Data & AI for Startup Impact"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><script type=application/ld+json>{"@context":"https://schema.org","@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","url":"https://yanirseroussi.com/","description":"Helping climate \u0026amp; nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).\n","thumbnailUrl":"https://yanirseroussi.com/favicon.ico","sameAs":["https://www.linkedin.com/in/yanirseroussi/","https://github.com/yanirs/","https://scholar.google.com.au/citations?user=NR254LoAAAAJ","/contact/"]}</script></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><div class=profile><div class=profile_inner><img draggable=false src=https://yanirseroussi.com/home-profile.webp alt="Yanir Seroussi's profile picture" title="Yanir Seroussi's profile picture" height=250 width=278><h1>Yanir Seroussi<br>Startup Data & AI Consultant</h1><span><dl><dt><strong>Current mission:</strong></dt><dd><a href=/consult/>Helping climate & nature tech startups ship data-intensive solutions. 🐳</a></dd><dt><strong>Industry experience:</strong></dt><dd><a href=/about/>Over a decade with startups & scaleups across data science, engineering, and tech leadership roles.</a></dd><dt><strong>Formal education:</strong></dt><dd><a href=/about/#past-work-examples>BSc in Computer Science. PhD in Artificial Intelligence. Still learning every day.</a></dd><dt><strong>Sample testimonial:</strong></dt><dd><em><a href=/about/#testimonials>&ldquo;The single most talented person I&rsquo;ve had the opportunity to work with in my nearly 15 years in Data/ML.&rdquo;</a></em></dd><dt><strong>Writing topics:</strong></dt><dd><a href=/posts/>Data / AI / ML, with a focus on practical applications for startups.</a></dd></dl><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
+<input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
+<button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div></span><div class=social-icons><a href=https://www.linkedin.com/in/yanirseroussi/ target=_blank rel="noopener noreferrer me" title=LinkedIn><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M16 8a6 6 0 016 6v7h-4v-7a2 2 0 00-2-2 2 2 0 00-2 2v7h-4v-7a6 6 0 016-6z"/><rect x="2" y="9" width="4" height="12"/><circle cx="4" cy="4" r="2"/></svg>
 </a><a href=https://github.com/yanirs/ target=_blank rel="noopener noreferrer me" title=GitHub><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M9 19c-5 1.5-5-2.5-7-3m14 6v-3.87a3.37 3.37.0 00-.94-2.61c3.14-.35 6.44-1.54 6.44-7A5.44 5.44.0 0020 4.77 5.07 5.07.0 0019.91 1S18.73.65 16 2.48a13.38 13.38.0 00-7 0C6.27.65 5.09 1 5.09 1A5.07 5.07.0 005 4.77 5.44 5.44.0 003.5 8.55c0 5.42 3.3 6.61 6.44 7A3.37 3.37.0 009 18.13V22"/></svg>
 </a><a href="https://scholar.google.com.au/citations?user=NR254LoAAAAJ" target=_blank rel="noopener noreferrer me" title=GoogleScholar><svg role="img" viewBox="0 0 24 25" xmlns="http://www.w3.org/2000/svg" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M5.242 13.769.5 9.5 12 1l11.5 9-5.242 3.769C17.548 11.249 14.978 9.5 12 9.5c-2.977.0-5.548 1.748-6.758 4.269zM12 10a7 7 0 100 14 7 7 0 000-14z"/></svg>
 </a><a href=/contact/ target=_blank rel="noopener noreferrer me" title=Email><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 21" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M4 4h16c1.1.0 2 .9 2 2v12c0 1.1-.9 2-2 2H4c-1.1.0-2-.9-2-2V6c0-1.1.9-2 2-2z"/><polyline points="22,6 12,13 2,6"/></svg></a></div><div class=buttons><a class=button href=/about/ rel=noopener title=About><span class=button-inner>About
diff --git a/index.xml b/index.xml
index ee2f3eb2d..4f28900be 100644
--- a/index.xml
+++ b/index.xml
@@ -1,5 +1,4 @@
 <?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Yanir Seroussi | Data &amp; AI for Startup Impact</title><link>https://yanirseroussi.com/</link><description>Recent content on Yanir Seroussi | Data &amp; AI for Startup Impact</description><generator>Hugo -- gohugo.io</generator><language>en-au</language><copyright>Text and figures licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/) by [Yanir Seroussi](https://yanirseroussi.com/about/), except where noted otherwise</copyright><lastBuildDate>Wed, 26 Jun 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://yanirseroussi.com/index.xml" rel="self" type="application/rss+xml"/><item><title>Five team-building mistakes, according to Patty McCord</title><link>https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/</link><pubDate>Wed, 26 Jun 2024 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/</guid><description>Takeaways from an interview with Patty McCord on The Startup Podcast.</description></item><item><title>Is your tech stack ready for data-intensive applications?</title><link>https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/</link><pubDate>Mon, 24 Jun 2024 02:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/</guid><description>Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.</description></item><item><title>Dealing with endless data changes</title><link>https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/</link><pubDate>Sat, 22 Jun 2024 22:50:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/</guid><description>Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data.</description></item><item><title>AI ain't gonna save you from bad data</title><link>https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/</link><pubDate>Mon, 17 Jun 2024 02:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/</guid><description>Since we&amp;rsquo;re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects.</description></item><item><title>The rules of the passion economy</title><link>https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/</link><pubDate>Wed, 12 Jun 2024 02:50:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/</guid><description>Summary of the main messages from the book The Passion Economy by Adam Davidson.</description></item><item><title>Startup data health starts with healthy event tracking</title><link>https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/</link><pubDate>Mon, 10 Jun 2024 04:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/</guid><description>Expanding on the startup health check question of tracking Kukuyeva&amp;rsquo;s five business aspects as wide events.</description></item><item><title>How to avoid startups with poor development processes</title><link>https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/</link><pubDate>Mon, 03 Jun 2024 02:45:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/</guid><description>Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.</description></item><item><title>Plumbing, Decisions, and Automation: De-hyping Data &amp; AI</title><link>https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/</link><pubDate>Mon, 27 May 2024 02:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/</guid><description>Three essential questions to understand where an organisation stands when it comes to Data &amp;amp; AI (with zero hype).</description></item><item><title>Adapting to the economy of algorithms</title><link>https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/</link><pubDate>Sat, 25 May 2024 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/</guid><description>Overview of the book The Economy of Algorithms by Marek Kowalkiewicz.</description></item><item><title>Question startup culture before accepting a data-to-AI role</title><link>https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/</link><pubDate>Mon, 20 May 2024 02:25:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/</guid><description>Eight questions that prospective data-to-AI employees should ask about a startup&amp;rsquo;s work and data culture.</description></item><item><title>Probing the People aspects of an early-stage startup</title><link>https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/</link><pubDate>Mon, 13 May 2024 02:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/</guid><description>Ten questions that prospective employees should ask about a startup&amp;rsquo;s team, especially for data-centric roles.</description></item><item><title>Business questions to ask before taking a startup data role</title><link>https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/</link><pubDate>Mon, 06 May 2024 04:30:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/</guid><description>Fourteen questions that prospective employees should ask about a startup&amp;rsquo;s business model and product, especially for data-focused roles.</description></item><item><title>Mentorship and the art of actionable advice</title><link>https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/</link><pubDate>Mon, 29 Apr 2024 06:30:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/</guid><description>Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships.</description></item><item><title>Assessing a startup's data-to-AI health</title><link>https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/</link><pubDate>Mon, 22 Apr 2024 06:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/</guid><description>Reviewing the areas that should be assessed to determine a startup&amp;rsquo;s opportunities and challenges on the data/AI/ML front.</description></item><item><title>AI does not obviate the need for testing and observability</title><link>https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/</link><pubDate>Mon, 15 Apr 2024 05:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/</guid><description>It&amp;rsquo;s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software.</description></item><item><title>LinkedIn is a teachable skill</title><link>https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/</link><pubDate>Thu, 11 Apr 2024 01:45:25 +0000</pubDate><guid>https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/</guid><description>An high-level overview of things I learned from Justin Welsh&amp;rsquo;s LinkedIn Operating System course.</description></item><item><title>My experience as a Data Tech Lead with Work on Climate</title><link>https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/</link><pubDate>Mon, 08 Apr 2024 02:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/</guid><description>The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting &amp;amp; fractional work.</description></item><item><title>The data engineering lifecycle is not going anywhere</title><link>https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/</link><pubDate>Fri, 05 Apr 2024 01:00:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/</guid><description>My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley.</description></item><item><title>Artificial intelligence, automation, and the art of counting fish</title><link>https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/</link><pubDate>Mon, 01 Apr 2024 06:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/</guid><description>Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement.</description></item><item><title>Atomic Habits is full of actionable advice</title><link>https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/</link><pubDate>Tue, 12 Mar 2024 06:19:31 +0000</pubDate><guid>https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/</guid><description>I put the book to use after the first listen, and will definitely revisit it in the future to form better habits.</description></item><item><title>Questions to consider when using AI for PDF data extraction</title><link>https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/</link><pubDate>Mon, 11 Mar 2024 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/</guid><description>Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.</description></item><item><title>Two types of startup data problems</title><link>https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/</link><pubDate>Mon, 04 Mar 2024 02:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/</guid><description>Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they&amp;rsquo;re likely to face.</description></item><item><title>Avoiding AI complexity: First, write no code</title><link>https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/</link><pubDate>Mon, 26 Feb 2024 01:45:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/</guid><description>Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.</description></item><item><title>Building your startup's minimum viable data stack</title><link>https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/</link><pubDate>Mon, 19 Feb 2024 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/</guid><description>First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations.</description></item><item><title>The three Cs of indie consulting: Confidence, Cash, and Connections</title><link>https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/</link><pubDate>Sat, 17 Feb 2024 02:00:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/</guid><description>Jonathan Stark makes a compelling argument why you should have the three Cs before quitting your job to go solo consulting.</description></item><item><title>Nudging ChatGPT to invent books you have no time to read</title><link>https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/</link><pubDate>Mon, 12 Feb 2024 05:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/</guid><description>Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities.</description></item><item><title>Future software development may require fewer humans</title><link>https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/</link><pubDate>Tue, 06 Feb 2024 06:15:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/</guid><description>Reflecting on an interview with Jason Warner, CEO of poolside.</description></item><item><title>Substance over titles: Your first data hire may be a data scientist</title><link>https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/</link><pubDate>Mon, 05 Feb 2024 02:45:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/</guid><description>Advice for hiring a startup&amp;rsquo;s first data person: match skills to business needs, consider contractors, and get help from data people.</description></item><item><title>New decade, new tagline: Data &amp; AI for Impact</title><link>https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/</link><pubDate>Fri, 19 Jan 2024 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/</guid><description>Shifting focus to &amp;lsquo;Data &amp;amp; AI for Impact&amp;rsquo;, with more startup-related content, increased posting frequency, and deeper audience engagement.</description></item><item><title>Psychographic specialisations may work for discipline generalists</title><link>https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/</link><pubDate>Tue, 09 Jan 2024 03:00:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/</guid><description>When focusing on a market segment defined by personal beliefs, it&amp;rsquo;s often fine to position yourself as a generalist in your craft.</description></item><item><title>The power of parasocial relationships</title><link>https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/</link><pubDate>Mon, 08 Jan 2024 06:00:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/</guid><description>Repeated exposure to media personas creates relationships that help justify premium fees.</description></item><item><title>Positioning is a common problem for data scientists</title><link>https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/</link><pubDate>Mon, 18 Dec 2023 00:30:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/</guid><description>With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark.</description></item><item><title>Transfer learning applies to energy market bidding</title><link>https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/</link><pubDate>Thu, 14 Dec 2023 00:15:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/</guid><description>An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland.</description></item><item><title>Supporting volunteer monitoring of marine biodiversity with modern web and data tools</title><link>https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/</link><pubDate>Wed, 29 Nov 2023 02:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/</guid><description>Summarising the work Uri Seroussi and I did to improve Reef Life Survey&amp;rsquo;s Reef Species of the World app.</description></item><item><title>Our Blue Machine is changing, but we are not helpless</title><link>https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/</link><pubDate>Tue, 28 Nov 2023 06:40:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/</guid><description>One of my many highlights from Helen Czerski&amp;rsquo;s Blue Machine.</description></item><item><title>You don't need a proprietary API for static maps</title><link>https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/</link><pubDate>Tue, 21 Nov 2023 06:00:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/</guid><description>For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps.</description></item><item><title>Lessons from reluctant data engineering</title><link>https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/</link><pubDate>Wed, 25 Oct 2023 04:45:00 +0000</pubDate><guid>https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/</guid><description>Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had.</description></item><item><title>Artificial intelligence was a marketing term all along – just call it automation</title><link>https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/</link><pubDate>Fri, 06 Oct 2023 05:00:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/</guid><description>Replacing &amp;lsquo;artificial intelligence&amp;rsquo; with &amp;lsquo;automation&amp;rsquo; is a useful trick for cutting through the hype.</description></item><item><title>The lines between solo consulting and product building are blurry</title><link>https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/</link><pubDate>Mon, 25 Sep 2023 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/</guid><description>It turns out that problems like finding a niche and defining the ideal clients are key to any solo business.</description></item><item><title>Google's Rules of Machine Learning still apply in the age of large language models</title><link>https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/</link><pubDate>Thu, 21 Sep 2023 21:30:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/</guid><description>Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices.</description></item><item><title>My rediscovery of quiet writing on the open web</title><link>https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/</link><pubDate>Mon, 28 Aug 2023 05:30:00 +0000</pubDate><guid>https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/</guid><description>Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes.</description></item><item><title>The Minimalist Entrepreneur is too prescriptive for me</title><link>https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/</link><pubDate>Mon, 21 Aug 2023 03:15:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/</guid><description>While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder&amp;rsquo;s experience.</description></item><item><title>Revisiting Start Small, Stay Small in 2023 (Chapter 2)</title><link>https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/</link><pubDate>Thu, 17 Aug 2023 07:45:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/</guid><description>A summary of the second chapter of Rob Walling&amp;rsquo;s Start Small, Stay Small, along with my thoughts &amp;amp; reflections.</description></item><item><title>Revisiting Start Small, Stay Small in 2023 (Chapter 1)</title><link>https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/</link><pubDate>Wed, 16 Aug 2023 05:45:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/</guid><description>A summary of the first chapter of Rob Walling&amp;rsquo;s Start Small, Stay Small, along with my thoughts &amp;amp; reflections.</description></item><item><title>Email notifications on public GitHub commits</title><link>https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/</link><pubDate>Mon, 14 Aug 2023 05:15:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/</guid><description>GitHub publishes an Atom feed, which means you can use any RSS reader to follow commits.</description></item><item><title>The rule of thirds can probably be ignored</title><link>https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/</link><pubDate>Fri, 11 Aug 2023 03:15:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/</guid><description>Turns out that the rule of thirds for composing visuals may not be that important.</description></item><item><title>Using YubiKey for SSH access</title><link>https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/</link><pubDate>Sun, 23 Jul 2023 00:07:15 +0000</pubDate><guid>https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/</guid><description>Some pointers for setting up SSH access with YubiKey on Ubuntu 22.04.</description></item><item><title>Making a TIL section with Hugo and PaperMod</title><link>https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/</link><pubDate>Mon, 17 Jul 2023 00:06:15 +0000</pubDate><guid>https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/</guid><description>How I added a Today I Learned section to my Hugo site with the PaperMod theme.</description></item><item><title>You can't save time</title><link>https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/</link><pubDate>Tue, 11 Jul 2023 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/</guid><description>Time can be spent doing different activities, but it can&amp;rsquo;t be stored and saved for later.</description></item><item><title>Was data science a failure mode of software engineering?</title><link>https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/</link><pubDate>Fri, 30 Jun 2023 00:06:30 +0000</pubDate><guid>https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/</guid><description>Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.</description></item><item><title>How hackable are automated coding assessments?</title><link>https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/</link><pubDate>Fri, 26 May 2023 00:03:00 +0000</pubDate><guid>https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/</guid><description>Exploring the hackability of speed-based coding tests, using CodeSignal&amp;rsquo;s Industry Coding Framework as a case study.</description></item><item><title>Remaining relevant as a small language model</title><link>https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/</link><pubDate>Fri, 21 Apr 2023 00:06:30 +0000</pubDate><guid>https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/</guid><description>Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now).</description></item><item><title>ChatGPT is transformative AI</title><link>https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/</link><pubDate>Sun, 11 Dec 2022 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/</guid><description>My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it&amp;rsquo;s only the beginning.</description></item><item><title>Causal Machine Learning is off to a good start, despite some issues</title><link>https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/</link><pubDate>Mon, 12 Sep 2022 02:45:00 +0000</pubDate><guid>https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/</guid><description>Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.</description></item><item><title>The mission matters: Moving to climate tech as a data scientist</title><link>https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/</link><pubDate>Mon, 06 Jun 2022 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/</guid><description>Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.</description></item><item><title>Building useful machine learning tools keeps getting easier: A fish ID case study</title><link>https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/</link><pubDate>Sun, 20 Mar 2022 04:30:00 +0000</pubDate><guid>https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/</guid><description>Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.</description></item><item><title>Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials</title><link>https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/</link><pubDate>Fri, 14 Jan 2022 00:05:40 +0000</pubDate><guid>https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/</guid><description>Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.</description></item><item><title>Use your human brain to avoid artificial intelligence disasters</title><link>https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/</link><pubDate>Mon, 22 Nov 2021 03:45:00 +0000</pubDate><guid>https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/</guid><description>Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.</description></item><item><title>Migrating from WordPress.com to Hugo on GitHub + Cloudflare</title><link>https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/</link><pubDate>Wed, 10 Nov 2021 06:30:00 +0000</pubDate><guid>https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/</guid><description>My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.</description></item><item><title>My work with Automattic</title><link>https://yanirseroussi.com/2021/10/07/my-work-with-automattic/</link><pubDate>Thu, 07 Oct 2021 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/2021/10/07/my-work-with-automattic/</guid><description>Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I&amp;rsquo;ve done with the company.</description></item><item><title>Some highlights from 2020</title><link>https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/</link><pubDate>Mon, 05 Apr 2021 06:41:48 +0000</pubDate><guid>https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/</guid><description>Sharing remote teamwork insights, my climate &amp;amp; sustainability activism, Reef Life Survey publications, and progress on Automattic&amp;rsquo;s Experimentation Platform.</description></item><item><title>Many is not enough: Counting simulations to bootstrap the right way</title><link>https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/</link><pubDate>Mon, 24 Aug 2020 01:35:17 +0000</pubDate><guid>https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/</guid><description>Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals.</description></item><item><title>Software commodities are eating interesting data science work</title><link>https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/</link><pubDate>Sat, 11 Jan 2020 09:22:35 +0000</pubDate><guid>https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/</guid><description>Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?</description></item><item><title>A day in the life of a remote data scientist</title><link>https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/</link><pubDate>Wed, 11 Dec 2019 22:06:19 +0000</pubDate><guid>https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/</guid><description>Video of a talk I gave on remote data science work at the Data Science Sydney meetup.</description></item><item><title>Bootstrapping the right way?</title><link>https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/</link><pubDate>Sun, 06 Oct 2019 06:48:07 +0000</pubDate><guid>https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/</guid><description>Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.</description></item><item><title>Hackers beware: Bootstrap sampling may be harmful</title><link>https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/</link><pubDate>Mon, 07 Jan 2019 21:07:56 +0000</pubDate><guid>https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/</guid><description>Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren&amp;rsquo;t that simple.</description></item><item><title>The most practical causal inference book I’ve read (is still a draft)</title><link>https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/</link><pubDate>Mon, 24 Dec 2018 02:37:50 +0000</pubDate><guid>https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/</guid><description>Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area.</description></item><item><title>Reflections on remote data science work</title><link>https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/</link><pubDate>Sat, 03 Nov 2018 06:33:13 +0000</pubDate><guid>https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/</guid><description>Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist.</description></item><item><title>Defining data science in 2018</title><link>https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/</link><pubDate>Sun, 22 Jul 2018 08:27:43 +0000</pubDate><guid>https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/</guid><description>Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.</description></item><item><title>Advice for aspiring data scientists and other FAQs</title><link>https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/</link><pubDate>Sun, 15 Oct 2017 09:15:25 +0000</pubDate><guid>https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/</guid><description>Frequently asked questions by visitors to this site, especially around entering the data science field.</description></item><item><title>State of Bandcamp Recommender, Late 2017</title><link>https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/</link><pubDate>Sat, 02 Sep 2017 10:19:02 +0000</pubDate><guid>https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/</guid><description>Call for BCRecommender maintainers followed by a decision to shut it down, as I don&amp;rsquo;t have enough time and Bandcamp now offers recommendations.</description></item><item><title>My 10-step path to becoming a remote data scientist with Automattic</title><link>https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/</link><pubDate>Sat, 29 Jul 2017 05:39:26 +0000</pubDate><guid>https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/</guid><description>I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.</description></item><item><title>Exploring and visualising Reef Life Survey data</title><link>https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/</link><pubDate>Sat, 03 Jun 2017 00:49:05 +0000</pubDate><guid>https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/</guid><description>Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.</description></item><item><title>Customer lifetime value and the proliferation of misinformation on the internet</title><link>https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/</link><pubDate>Sun, 08 Jan 2017 20:02:30 +0000</pubDate><guid>https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/</guid><description>There&amp;rsquo;s a lot of misleading content on the estimation of customer lifetime value. Here&amp;rsquo;s what I learned about doing it well.</description></item><item><title>Ask Why! Finding motives, causes, and purpose in data science</title><link>https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/</link><pubDate>Mon, 19 Sep 2016 21:28:44 +0000</pubDate><guid>https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/</guid><description>Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what &amp;amp; how of predictive modelling.</description></item><item><title>If you don’t pay attention, data can drive you off a cliff</title><link>https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/</link><pubDate>Sun, 21 Aug 2016 21:34:17 +0000</pubDate><guid>https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/</guid><description>Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.</description></item><item><title>Is Data Scientist a useless job title?</title><link>https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/</link><pubDate>Thu, 04 Aug 2016 22:26:03 +0000</pubDate><guid>https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/</guid><description>It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.</description></item><item><title>Making Bayesian A/B testing more accessible</title><link>https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/</link><pubDate>Sun, 19 Jun 2016 10:32:15 +0000</pubDate><guid>https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/</guid><description>A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.</description></item><item><title>Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions</title><link>https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/</link><pubDate>Sat, 14 May 2016 19:57:03 +0000</pubDate><guid>https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/</guid><description>Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg&amp;rsquo;s Causality, Probability, and Time.</description></item><item><title>The rise of greedy robots</title><link>https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/</link><pubDate>Sun, 20 Mar 2016 20:33:43 +0000</pubDate><guid>https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/</guid><description>Is artificial/machine intelligence a future threat? I argue that it&amp;rsquo;s already here, with greedy robots already dominating our lives.</description></item><item><title>Why you should stop worrying about deep learning and deepen your understanding of causality instead</title><link>https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/</link><pubDate>Sun, 14 Feb 2016 11:04:11 +0000</pubDate><guid>https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/</guid><description>Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.</description></item><item><title>The joys of offline data collection</title><link>https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/</link><pubDate>Sun, 24 Jan 2016 00:32:25 +0000</pubDate><guid>https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/</guid><description>Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.</description></item><item><title>This holiday season, give me real insights</title><link>https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/</link><pubDate>Tue, 08 Dec 2015 06:57:25 +0000</pubDate><guid>https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/</guid><description>Some companies present raw data or information as &amp;ldquo;insights&amp;rdquo;. This post surveys some examples, and discusses how they can be turned into real insights.</description></item><item><title>The hardest parts of data science</title><link>https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/</link><pubDate>Mon, 23 Nov 2015 04:14:21 +0000</pubDate><guid>https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/</guid><description>Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.</description></item><item><title>Migrating a simple web application from MongoDB to Elasticsearch</title><link>https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/</link><pubDate>Wed, 04 Nov 2015 03:53:18 +0000</pubDate><guid>https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/</guid><description>Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.</description></item><item><title>Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling</title><link>https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/</link><pubDate>Mon, 19 Oct 2015 00:02:32 +0000</pubDate><guid>https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/</guid><description>Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.</description></item><item><title>The wonderful world of recommender systems</title><link>https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/</link><pubDate>Fri, 02 Oct 2015 05:25:57 +0000</pubDate><guid>https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/</guid><description>Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.</description></item><item><title>You don’t need a data scientist (yet)</title><link>https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/</link><pubDate>Mon, 24 Aug 2015 08:25:30 +0000</pubDate><guid>https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/</guid><description>Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist.</description></item><item><title>Goodbye, Parse.com</title><link>https://yanirseroussi.com/2015/07/31/goodbye-parse-com/</link><pubDate>Fri, 31 Jul 2015 03:29:50 +0000</pubDate><guid>https://yanirseroussi.com/2015/07/31/goodbye-parse-com/</guid><description>Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution.</description></item><item><title>Learning about deep learning through album cover classification</title><link>https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/</link><pubDate>Mon, 06 Jul 2015 22:21:42 +0000</pubDate><guid>https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/</guid><description>Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.</description></item><item><title>Deep learning resources</title><link>https://yanirseroussi.com/deep-learning-resources/</link><pubDate>Mon, 06 Jul 2015 00:38:44 +0000</pubDate><guid>https://yanirseroussi.com/deep-learning-resources/</guid><description>This page summarises the deep learning resources I&amp;rsquo;ve consulted in my album cover classification project.
 Tutorials and blog posts Convolutional Neural Networks for Visual Recognition Stanford course notes: an excellent resource, very up-to-date and useful, despite still being a work in progress DeepLearning.net&amp;rsquo;s Theano-based tutorials: not as up-to-date as the Stanford course notes, but still a good introduction to some of the theory and general Theano usage Lasagne&amp;rsquo;s documentation and tutorials: still a bit lacking, but good when you know what you&amp;rsquo;re looking for lasagne4newbs: Lasagne&amp;rsquo;s convnet example with richer comments Using convolutional neural nets to detect facial keypoints tutorial: the resource that made me want to use Lasagne Classifying plankton with deep neural networks: an epic post, which I found while looking for Lasagne examples Various Wikipedia pages: a bit disappointing – the above resources are much better Papers Adam: a method for stochastic optimization (Kingma and Ba, 2015): an improvement over SGD with Nesterov momentum, AdaGrad and RMSProp, which I found to be useful in practice Algorithms for Hyper-Parameter Optimization (Bergstra et al.</description></item><item><title>Hopping on the deep learning bandwagon</title><link>https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/</link><pubDate>Sat, 06 Jun 2015 05:00:22 +0000</pubDate><guid>https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/</guid><description>To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.</description></item><item><title>First steps in data science: author-aware sentiment analysis</title><link>https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/</link><pubDate>Sat, 02 May 2015 08:31:10 +0000</pubDate><guid>https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/</guid><description>I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.</description></item><item><title>My divestment from fossil fuels</title><link>https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/</link><pubDate>Fri, 24 Apr 2015 00:19:36 +0000</pubDate><guid>https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/</guid><description>Recent choices I&amp;rsquo;ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons.</description></item><item><title>My PhD work</title><link>https://yanirseroussi.com/phd-work/</link><pubDate>Mon, 30 Mar 2015 03:23:33 +0000</pubDate><guid>https://yanirseroussi.com/phd-work/</guid><description>An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.</description></item><item><title>The long road to a lifestyle business</title><link>https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/</link><pubDate>Sun, 22 Mar 2015 09:43:47 +0000</pubDate><guid>https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/</guid><description>Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects.</description></item><item><title>Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)</title><link>https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/</link><pubDate>Wed, 11 Feb 2015 06:34:17 +0000</pubDate><guid>https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/</guid><description>My team&amp;rsquo;s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).</description></item><item><title>Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)</title><link>https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/</link><pubDate>Thu, 29 Jan 2015 10:37:39 +0000</pubDate><guid>https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/</guid><description>Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).</description></item><item><title>Automating Parse.com bulk data imports</title><link>https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/</link><pubDate>Thu, 15 Jan 2015 04:41:16 +0000</pubDate><guid>https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/</guid><description>A script for importing data into the Parse backend-as-a-service.</description></item><item><title>Stochastic Gradient Boosting: Choosing the Best Number of Iterations</title><link>https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/</link><pubDate>Mon, 29 Dec 2014 02:30:06 +0000</pubDate><guid>https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/</guid><description>Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.</description></item><item><title>SEO: Mostly about showing up?</title><link>https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/</link><pubDate>Mon, 15 Dec 2014 04:25:25 +0000</pubDate><guid>https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/</guid><description>Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content.</description></item><item><title>Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)</title><link>https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/</link><pubDate>Wed, 19 Nov 2014 09:17:34 +0000</pubDate><guid>https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/</guid><description>Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.</description></item><item><title>BCRecommender Traction Update</title><link>https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/</link><pubDate>Wed, 05 Nov 2014 02:29:35 +0000</pubDate><guid>https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/</guid><description>Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing.</description></item><item><title>What is data science?</title><link>https://yanirseroussi.com/2014/10/23/what-is-data-science/</link><pubDate>Thu, 23 Oct 2014 03:22:08 +0000</pubDate><guid>https://yanirseroussi.com/2014/10/23/what-is-data-science/</guid><description>Data science has been a hot term in the past few years. Still, there isn&amp;rsquo;t a single definition of the field. This post discusses my favourite definition.</description></item><item><title>Greek Media Monitoring Kaggle competition: My approach</title><link>https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/</link><pubDate>Tue, 07 Oct 2014 03:21:35 +0000</pubDate><guid>https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/</guid><description>Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.</description></item><item><title>Applying the Traction Book’s Bullseye framework to BCRecommender</title><link>https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/</link><pubDate>Wed, 24 Sep 2014 04:57:39 +0000</pubDate><guid>https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/</guid><description>Ranking 19 channels with the goal of getting traction for BCRecommender.</description></item><item><title>Bandcamp recommendation and discovery algorithms</title><link>https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/</link><pubDate>Fri, 19 Sep 2014 14:26:55 +0000</pubDate><guid>https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/</guid><description>The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.</description></item><item><title>Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)</title><link>https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/</link><pubDate>Sun, 07 Sep 2014 10:48:44 +0000</pubDate><guid>https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/</guid><description>Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service.</description></item><item><title>Building a Bandcamp recommender system (part 1 – motivation)</title><link>https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/</link><pubDate>Sat, 30 Aug 2014 08:11:38 +0000</pubDate><guid>https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/</guid><description>My motivation behind building BCRecommender, a free recommendation &amp;amp; discovery service for Bandcamp music.</description></item><item><title>How to (almost) win Kaggle competitions</title><link>https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/</link><pubDate>Sun, 24 Aug 2014 12:40:53 +0000</pubDate><guid>https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/</guid><description>Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.</description></item><item><title>Data’s hierarchy of needs</title><link>https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/</link><pubDate>Sun, 17 Aug 2014 13:09:30 +0000</pubDate><guid>https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/</guid><description>Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms &amp;amp; insights can only be as good as the underlying data.</description></item><item><title>Kaggle competition tips and summaries</title><link>https://yanirseroussi.com/kaggle/</link><pubDate>Sat, 05 Apr 2014 23:46:10 +0000</pubDate><guid>https://yanirseroussi.com/kaggle/</guid><description>Pointers to all my Kaggle advice posts and competition summaries.</description></item><item><title>Kaggle beginner tips</title><link>https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/</link><pubDate>Sun, 19 Jan 2014 10:34:28 +0000</pubDate><guid>https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/</guid><description>First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions.</description></item><item><title>About Yanir: Startup Data &amp; AI Consultant</title><link>https://yanirseroussi.com/about/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/about/</guid><description>About Yanir Seroussi, a hands-on data tech lead with over a decade of experience. Yanir helps climate/nature tech startups ship data-intensive solutions.</description></item><item><title>Book a free fifteen-minute call</title><link>https://yanirseroussi.com/free-intro-call/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/free-intro-call/</guid><description>Booking form for a quick intro call with Yanir Seroussi.</description></item><item><title>Causal inference resources</title><link>https://yanirseroussi.com/causal-inference-resources/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/causal-inference-resources/</guid><description>This is a list of some causal inference resources, which I update from time to time. You can also check out my posts on causal inference and A/B testing.
 Books:
-Causal Inference: What if by Miguel Hernán and Jamie Robins: The most practical book I&amp;rsquo;ve read. Highly recommended. Trustworthy Online Controlled Experiments : A Practical Guide to A/B Testing by Ron Kohavi, Diane Tang, and Ya Xu: Building on the authors&amp;rsquo; decades of industry experience, this is pretty much the bible of online experiments, which is how causal inference is often done in practice.</description></item><item><title>Free Guide: Data-to-AI Health Check for Startups</title><link>https://yanirseroussi.com/data-to-ai-health-check/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/data-to-ai-health-check/</guid><description>Download a free PDF guide that helps you assess a startup&amp;rsquo;s Data-to-AI health by probing eight key areas.</description></item><item><title>Helping climate &amp; nature tech startups ship data-intensive solutions</title><link>https://yanirseroussi.com/consult/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/consult/</guid><description>Consulting for climate &amp;amp; nature tech startups: Strategic advice, implementation of Data/AI/ML solutions, and hiring help by an experienced tech leader.</description></item><item><title>Stay in touch</title><link>https://yanirseroussi.com/contact/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/contact/</guid><description>Contact me or subscribe to the mailing list.</description></item><item><title>Talks</title><link>https://yanirseroussi.com/talks/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/talks/</guid><description>Just a list of some talks I&amp;rsquo;ve given, saved here for future reference and for general public benefit.
-Lessons from reluctant data engineering (presented at DataEngBytes Brisbane 2023; see video and post) Data ethics – beyond curve fitting (given as part of a local fast.ai course in June 2021; see video and post) Moving Automattic to net zero carbon emissions (PublishPress interview from November 2020) Running remote data teams (Data Futurology webinar from June 2020) Bootstrapping the right way (presented at YOW!</description></item></channel></rss>
\ No newline at end of file
+Causal Inference: What if by Miguel Hernán and Jamie Robins: The most practical book I&amp;rsquo;ve read. Highly recommended. Trustworthy Online Controlled Experiments : A Practical Guide to A/B Testing by Ron Kohavi, Diane Tang, and Ya Xu: Building on the authors&amp;rsquo; decades of industry experience, this is pretty much the bible of online experiments, which is how causal inference is often done in practice.</description></item><item><title>Free Guide: Data-to-AI Health Check for Startups</title><link>https://yanirseroussi.com/data-to-ai-health-check/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/data-to-ai-health-check/</guid><description>Download a free PDF guide that helps you assess a startup&amp;rsquo;s Data-to-AI health by probing eight key areas.</description></item><item><title>Helping climate &amp; nature tech startups ship data-intensive solutions</title><link>https://yanirseroussi.com/consult/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/consult/</guid><description>Consulting for climate &amp;amp; nature tech startups: Strategic advice, implementation of Data/AI/ML solutions, and hiring help by an experienced tech leader.</description></item><item><title>Past talks by Yanir: Startup Data &amp; AI Consultant</title><link>https://yanirseroussi.com/talks/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/talks/</guid><description>Yanir Seroussi&amp;rsquo;s talks on data science, artificial intelligence, machine learning, and career journey.</description></item><item><title>Stay in touch</title><link>https://yanirseroussi.com/contact/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://yanirseroussi.com/contact/</guid><description>Contact me or subscribe to the mailing list.</description></item></channel></rss>
\ No newline at end of file
diff --git a/kaggle/index.html b/kaggle/index.html
index e4fc707a9..d9ea4958a 100644
--- a/kaggle/index.html
+++ b/kaggle/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Kaggle competition tips and summaries | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data science,Kaggle,Kaggle competition,predictive modelling"><meta name=description content="Pointers to all my Kaggle advice posts and competition summaries."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/kaggle/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/kaggle/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Kaggle competition tips and summaries"><meta property="og:description" content="Pointers to all my Kaggle advice posts and competition summaries."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/kaggle/"><meta property="og:image" content="https://yanirseroussi.com/kaggle/kaggle-logo-transparent.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-04-05T23:46:10+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/kaggle/kaggle-logo-transparent.png"><meta name=twitter:title content="Kaggle competition tips and summaries"><meta name=twitter:description content="Pointers to all my Kaggle advice posts and competition summaries."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Kaggle competition tips and summaries","item":"https://yanirseroussi.com/kaggle/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Kaggle competition tips and summaries","name":"Kaggle competition tips and summaries","description":"Pointers to all my Kaggle advice posts and competition summaries.","keywords":["data science","Kaggle","Kaggle competition","predictive modelling"],"articleBody":"Over the years, I’ve participated in a few Kaggle competitions and wrote a bit about my experiences. This page contains pointers to all my posts, and will be updated if/when I participate in more competitions.\nGeneral advice posts 10 Steps to Success in Kaggle Data Science Competitions (guest post on KDNuggets) How to (almost) win Kaggle competitions Kaggle beginner tips Solution posts Greek Media Monitoring Multilabel Classification [6th/120] – multi-label classification of pre-tokenised texts Personalised Web Search Challenge [9th/194] – reranking web search results in a personalised manner Blue Book for Bulldozers [9th/476] – forecasting auction sale price of bulldozers ICFHR 2012 – Arabic Writer Identification Competition [3rd/42] – classifying handwritten texts by the identity of the writer (Kaggle blog post) EMC Data Science Global Hackathon (Air Quality Prediction) [6th/110] – forecasting levels of air pollutants (Kaggle forum post) ","wordCount":"139","inLanguage":"en","image":"https://yanirseroussi.com/kaggle/kaggle-logo-transparent.png","datePublished":"2014-04-05T23:46:10Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/kaggle/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Kaggle competition tips and summaries</h1><div class=post-meta><span title='2014-04-05 23:46:10 +0000 UTC'>April 5, 2014</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/kaggle/kaggle-logo-transparent_hud5a5728fe9c376b674017f410efda607_7282_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/kaggle/kaggle-logo-transparent_hud5a5728fe9c376b674017f410efda607_7282_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/kaggle/kaggle-logo-transparent_hud5a5728fe9c376b674017f410efda607_7282_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/kaggle/kaggle-logo-transparent.png 1056w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/kaggle/kaggle-logo-transparent.png alt width=1056 height=480></figure><div class=post-content><p>Over the years, I&rsquo;ve participated in a few <a href=https://www.kaggle.com target=_blank rel=noopener>Kaggle</a> competitions and wrote a bit about my experiences. This page contains pointers to all my posts, and will be updated if/when I participate in more competitions.</p><h3 id=general-advice-posts>General advice posts<a hidden class=anchor aria-hidden=true href=#general-advice-posts>#</a></h3><ul><li><a href=http://www.kdnuggets.com/2015/03/10-steps-success-kaggle-data-science-competitions.html target=_blank rel=noopener>10 Steps to Success in Kaggle Data Science Competitions (guest post on KDNuggets)</a></li><li><a href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/ target=_blank rel=noopener>How to (almost) win Kaggle competitions</a></li><li><a href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/ target=_blank rel=noopener>Kaggle beginner tips</a></li></ul><h3 id=solution-posts>Solution posts<a hidden class=anchor aria-hidden=true href=#solution-posts>#</a></h3><ul><li><a href=https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/>Greek Media Monitoring Multilabel Classification</a> [6th/120] – multi-label classification of pre-tokenised texts</li><li><a href=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/ title="Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)">Personalised Web Search Challenge</a> [9th/194] – reranking web search results in a personalised manner</li><li><a href=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/ title="Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)">Blue Book for Bulldozers</a> [9th/476] – forecasting auction sale price of bulldozers</li><li><a href=http://blog.kaggle.com/2012/04/29/on-diffusion-kernels-histograms-and-arabic-writer-identification/ target=_blank rel=noopener>ICFHR 2012 – Arabic Writer Identification Competition</a> [3rd/42] – classifying handwritten texts by the identity of the writer (Kaggle blog post)</li><li><a href=https://www.kaggle.com/c/dsg-hackathon/forums/t/1821/general-approaches-to-partitioning-the-models/10631#post10631 target=_blank rel=noopener>EMC Data Science Global Hackathon (Air Quality Prediction)</a> [6th/110] – forecasting levels of air pollutants (Kaggle forum post)</li></ul></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/kaggle/>Kaggle</a></li><li><a href=https://yanirseroussi.com/tags/kaggle-competition/>Kaggle Competition</a></li><li><a href=https://yanirseroussi.com/tags/predictive-modelling/>Predictive Modelling</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle competition tips and summaries on x" href="https://x.com/intent/tweet/?text=Kaggle%20competition%20tips%20and%20summaries&amp;url=https%3a%2f%2fyanirseroussi.com%2fkaggle%2f&amp;hashtags=datascience%2cKaggle%2cKagglecompetition%2cpredictivemodelling"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle competition tips and summaries on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2fkaggle%2f&amp;title=Kaggle%20competition%20tips%20and%20summaries&amp;summary=Kaggle%20competition%20tips%20and%20summaries&amp;source=https%3a%2f%2fyanirseroussi.com%2fkaggle%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle competition tips and summaries on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2fkaggle%2f&title=Kaggle%20competition%20tips%20and%20summaries"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle competition tips and summaries on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2fkaggle%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle competition tips and summaries on whatsapp" href="https://api.whatsapp.com/send?text=Kaggle%20competition%20tips%20and%20summaries%20-%20https%3a%2f%2fyanirseroussi.com%2fkaggle%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle competition tips and summaries on telegram" href="https://telegram.me/share/url?text=Kaggle%20competition%20tips%20and%20summaries&amp;url=https%3a%2f%2fyanirseroussi.com%2fkaggle%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle competition tips and summaries on ycombinator" href="https://news.ycombinator.com/submitlink?t=Kaggle%20competition%20tips%20and%20summaries&u=https%3a%2f%2fyanirseroussi.com%2fkaggle%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="data science,Kaggle,Kaggle competition,predictive modelling"><meta name=description content="Pointers to all my Kaggle advice posts and competition summaries."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/kaggle/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/kaggle/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Kaggle competition tips and summaries"><meta property="og:description" content="Pointers to all my Kaggle advice posts and competition summaries."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/kaggle/"><meta property="og:image" content="https://yanirseroussi.com/kaggle/kaggle-logo-transparent.png"><meta property="article:section" content="posts"><meta property="article:published_time" content="2014-04-05T23:46:10+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/kaggle/kaggle-logo-transparent.png"><meta name=twitter:title content="Kaggle competition tips and summaries"><meta name=twitter:description content="Pointers to all my Kaggle advice posts and competition summaries."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"Kaggle competition tips and summaries","item":"https://yanirseroussi.com/kaggle/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Kaggle competition tips and summaries","name":"Kaggle competition tips and summaries","description":"Pointers to all my Kaggle advice posts and competition summaries.","keywords":["data science","Kaggle","Kaggle competition","predictive modelling"],"articleBody":"Over the years, I’ve participated in a few Kaggle competitions and wrote a bit about my experiences. This page contains pointers to all my posts, and will be updated if/when I participate in more competitions.\nGeneral advice posts 10 Steps to Success in Kaggle Data Science Competitions (guest post on KDNuggets) How to (almost) win Kaggle competitions Kaggle beginner tips Solution posts Greek Media Monitoring Multilabel Classification [6th/120] – multi-label classification of pre-tokenised texts Personalised Web Search Challenge [9th/194] – reranking web search results in a personalised manner Blue Book for Bulldozers [9th/476] – forecasting auction sale price of bulldozers ICFHR 2012 – Arabic Writer Identification Competition [3rd/42] – classifying handwritten texts by the identity of the writer (Kaggle blog post) EMC Data Science Global Hackathon (Air Quality Prediction) [6th/110] – forecasting levels of air pollutants (Kaggle forum post) ","wordCount":"139","inLanguage":"en","image":"https://yanirseroussi.com/kaggle/kaggle-logo-transparent.png","datePublished":"2014-04-05T23:46:10Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/kaggle/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Kaggle competition tips and summaries</h1><div class=post-meta><span title='2014-04-05 23:46:10 +0000 UTC'>April 5, 2014</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/kaggle/kaggle-logo-transparent_hud5a5728fe9c376b674017f410efda607_7282_360x0_resize_box_3.png 360w ,https://yanirseroussi.com/kaggle/kaggle-logo-transparent_hud5a5728fe9c376b674017f410efda607_7282_480x0_resize_box_3.png 480w ,https://yanirseroussi.com/kaggle/kaggle-logo-transparent_hud5a5728fe9c376b674017f410efda607_7282_720x0_resize_box_3.png 720w ,https://yanirseroussi.com/kaggle/kaggle-logo-transparent.png 1056w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/kaggle/kaggle-logo-transparent.png alt width=1056 height=480></figure><div class=post-content><p>Over the years, I&rsquo;ve participated in a few <a href=https://www.kaggle.com target=_blank rel=noopener>Kaggle</a> competitions and wrote a bit about my experiences. This page contains pointers to all my posts, and will be updated if/when I participate in more competitions.</p><h3 id=general-advice-posts>General advice posts<a hidden class=anchor aria-hidden=true href=#general-advice-posts>#</a></h3><ul><li><a href=http://www.kdnuggets.com/2015/03/10-steps-success-kaggle-data-science-competitions.html target=_blank rel=noopener>10 Steps to Success in Kaggle Data Science Competitions (guest post on KDNuggets)</a></li><li><a href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/ target=_blank rel=noopener>How to (almost) win Kaggle competitions</a></li><li><a href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/ target=_blank rel=noopener>Kaggle beginner tips</a></li></ul><h3 id=solution-posts>Solution posts<a hidden class=anchor aria-hidden=true href=#solution-posts>#</a></h3><ul><li><a href=https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/>Greek Media Monitoring Multilabel Classification</a> [6th/120] – multi-label classification of pre-tokenised texts</li><li><a href=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/ title="Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)">Personalised Web Search Challenge</a> [9th/194] – reranking web search results in a personalised manner</li><li><a href=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/ title="Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)">Blue Book for Bulldozers</a> [9th/476] – forecasting auction sale price of bulldozers</li><li><a href=http://blog.kaggle.com/2012/04/29/on-diffusion-kernels-histograms-and-arabic-writer-identification/ target=_blank rel=noopener>ICFHR 2012 – Arabic Writer Identification Competition</a> [3rd/42] – classifying handwritten texts by the identity of the writer (Kaggle blog post)</li><li><a href=https://www.kaggle.com/c/dsg-hackathon/forums/t/1821/general-approaches-to-partitioning-the-models/10631#post10631 target=_blank rel=noopener>EMC Data Science Global Hackathon (Air Quality Prediction)</a> [6th/110] – forecasting levels of air pollutants (Kaggle forum post)</li></ul></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/kaggle/>Kaggle</a></li><li><a href=https://yanirseroussi.com/tags/kaggle-competition/>Kaggle Competition</a></li><li><a href=https://yanirseroussi.com/tags/predictive-modelling/>Predictive Modelling</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle competition tips and summaries on x" href="https://x.com/intent/tweet/?text=Kaggle%20competition%20tips%20and%20summaries&amp;url=https%3a%2f%2fyanirseroussi.com%2fkaggle%2f&amp;hashtags=datascience%2cKaggle%2cKagglecompetition%2cpredictivemodelling"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle competition tips and summaries on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2fkaggle%2f&amp;title=Kaggle%20competition%20tips%20and%20summaries&amp;summary=Kaggle%20competition%20tips%20and%20summaries&amp;source=https%3a%2f%2fyanirseroussi.com%2fkaggle%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle competition tips and summaries on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2fkaggle%2f&title=Kaggle%20competition%20tips%20and%20summaries"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle competition tips and summaries on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2fkaggle%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle competition tips and summaries on whatsapp" href="https://api.whatsapp.com/send?text=Kaggle%20competition%20tips%20and%20summaries%20-%20https%3a%2f%2fyanirseroussi.com%2fkaggle%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle competition tips and summaries on telegram" href="https://telegram.me/share/url?text=Kaggle%20competition%20tips%20and%20summaries&amp;url=https%3a%2f%2fyanirseroussi.com%2fkaggle%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Kaggle competition tips and summaries on ycombinator" href="https://news.ycombinator.com/submitlink?t=Kaggle%20competition%20tips%20and%20summaries&u=https%3a%2f%2fyanirseroussi.com%2fkaggle%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/phd-work/index.html b/phd-work/index.html
index 66fca1281..826628039 100644
--- a/phd-work/index.html
+++ b/phd-work/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>My PhD work | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,career,data science,machine learning,predictive modelling"><meta name=description content="An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/phd-work/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/phd-work/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="My PhD work"><meta property="og:description" content="An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/phd-work/"><meta property="og:image" content="https://yanirseroussi.com/phd-work/thesis.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-03-30T03:23:33+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/phd-work/thesis.jpg"><meta name=twitter:title content="My PhD work"><meta name=twitter:description content="An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"My PhD work","item":"https://yanirseroussi.com/phd-work/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"My PhD work","name":"My PhD work","description":"An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.","keywords":["artificial intelligence","career","data science","machine learning","predictive modelling"],"articleBody":"I did my PhD at Monash University under the supervision of Ingrid Zukerman and Fabian Bohnert. I started in March 2009 and submitted my thesis in August 2012. When excluding time spent on conference trips and three months of an internship with Google, it took about three years of work to complete the PhD, which is not too bad for a 100% research program (no coursework was required at the time).\nPeople often ask me how to become a data scientist. The PhD was my way of doing that, though it was entirely unplanned. In fact, I didn’t even want to do a PhD. My original plan was to come to Australia, do a master degree, and see if I like it here. Ingrid convinced me to do a PhD, because “the time difference to a master isn’t huge”. I don’t regret listening to her. I had the opportunity to work on interesting problems, travel, and generally have fun. The PhD has even made me more employable due to the boom in data-driven work, which wasn’t something I was aiming for. All I was hoping to achieve was being qualified to work on more interesting stuff than vanilla software engineering, which was my focus prior to the PhD.\nBroadly speaking, the topics of the PhD were in the areas of user modelling and natural language processing. I’m planning to eventually document the journey and the work done through a series of posts.1 The idea is to give a behind-the-scenes overview of the work that went into publishing the papers, as there are many lessons that may be useful to both PhD students and software engineers who wish to become data scientists. In addition, this website gets much more exposure than my papers ever did, so I hope that using this platform to explain the papers in a friendly language would enable a wider audience to build on my PhD work.\nThe title of my thesis is Text Mining and Rating Prediction with Topical User Models. The short, human-friendly abstract is:\nThis thesis develops novel statistical methods to infer implicit information from online user-generated texts. These methods analyse texts to identify and characterise users, detect their sentiments, and predict their preferences for items such as films. The inferred information may be harnessed for improved personalisation of online user experience.\nThe main publications that resulted from my PhD work are as follows. Links to posts about these publications will be added in the future. Please subscribe to get notified when this happens.\nYanir Seroussi, Ingrid Zukerman, and Fabian Bohnert, “Authorship Attribution with Topic Models”. In Computational Linguistics 40(2):269–310, 2014. PDF\nIn a sentence: Essentially a condensed version of my thesis Yanir Seroussi, “Text Mining and Rating Prediction with Topical User Models”. PhD thesis, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia, 2012. PDF\nIn a sentence: The thesis, as described above, which was awarded the Mollie Holman medal for the best thesis in the faculty of IT in 2012 Yanir Seroussi, Fabian Bohnert and Ingrid Zukerman, “Authorship attribution with author-aware topic models”. In ACL 2012, pages 264–269, Jeju, Republic of Korea, 2012. PDF\nIn a sentence: An authorship attribution model that combines latent Dirichlet allocation and the author-topic model Yanir Seroussi, Russell Smyth and Ingrid Zukerman, “Ghosts from the High Court’s past: Evidence from computational linguistics for Dixon ghosting for McTiernan and Rich”. In University of New South Wales Law Journal, 34(3):984–1005, 2011. PDF | Dataset\nIn a sentence: A law journal paper that explores the extent to which Australian high court justice Owen Dixon ghost-wrote judgements for Edward McTiernan and George Rich Yanir Seroussi, Ingrid Zukerman and Fabian Bohnert, “Authorship attribution with latent Dirichlet allocation”. In CoNLL 2011, pages 181–189, Portland, OR, USA, 2011. PDF | Judgement dataset | IMDB62 dataset\nIn a sentence: Applying latent Dirichlet allocation to the authorship attribution problem Yanir Seroussi, Fabian Bohnert and Ingrid Zukerman, “Personalised rating prediction for new users using latent factor models”. In HT 2011, pages 47–56, Eindhoven, The Netherlands, 2011. PDF | Dataset\nIn a sentence: Extensions to the basic matrix factorisation approach to recommender systems to handle scenarios with new users who have little data associated with them Yanir Seroussi, Ingrid Zukerman and Fabian Bohnert, “Collaborative inference of sentiments from texts”. In UMAP 2010, pages 195–206, Waikoloa, HI, USA, 2010. PDF | Dataset | Blog post\nIn a sentence: An application of a model based on neighbour-based collaborative filtering to a variant of the sentiment analysis problem where the authors are known July 2023 update: Just noticed this plan while tidying up the website. The series of posts never got off the ground. As it’s been eight years, I think it’s safe to say it’s not going to happen. ↩︎\n","wordCount":"790","inLanguage":"en","image":"https://yanirseroussi.com/phd-work/thesis.jpg","datePublished":"2015-03-30T03:23:33Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/phd-work/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">My PhD work</h1><div class=post-meta><span title='2015-03-30 03:23:33 +0000 UTC'>March 30, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/phd-work/thesis_hufffef9059c063cbf1893ec2887faca16_981984_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/phd-work/thesis_hufffef9059c063cbf1893ec2887faca16_981984_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/phd-work/thesis_hufffef9059c063cbf1893ec2887faca16_981984_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/phd-work/thesis_hufffef9059c063cbf1893ec2887faca16_981984_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/phd-work/thesis_hufffef9059c063cbf1893ec2887faca16_981984_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/phd-work/thesis.jpg 2592w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/phd-work/thesis.jpg alt width=2592 height=1552></figure><div class=post-content><p>I did my PhD at <a href=http://www.monash.edu/ target=_blank rel=noopener>Monash University</a> under the supervision of <a href=http://users.monash.edu/~ingrid/ target=_blank rel=noopener>Ingrid Zukerman</a> and <a href=https://sites.google.com/a/bohnert.eu/fabian-bohnert/ target=_blank rel=noopener>Fabian Bohnert</a>. I started in March 2009 and submitted my thesis in August 2012. When excluding time spent on conference trips and three months of an internship with Google, it took about three years of work to complete the PhD, which is not too bad for a 100% research program (no coursework was required at the time).</p><p>People often ask me how to become <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/ title="What is data science?">a data scientist</a>. The PhD was my way of doing that, though it was entirely unplanned. In fact, I didn&rsquo;t even want to do a PhD. My original plan was to come to Australia, do a master degree, and see if I like it here. Ingrid convinced me to do a PhD, because &ldquo;the time difference to a master isn&rsquo;t huge&rdquo;. I don&rsquo;t regret listening to her. I had the opportunity to work on interesting problems, travel, and generally have fun. The PhD has even made me more employable due to the boom in data-driven work, which wasn&rsquo;t something I was aiming for. All I was hoping to achieve was being qualified to work on more interesting stuff than vanilla software engineering, which was my focus prior to the PhD.</p><p>Broadly speaking, the topics of the PhD were in the areas of user modelling and natural language processing. I&rsquo;m planning to eventually document the journey and the work done through a series of posts.<sup id=fnref:1><a href=#fn:1 class=footnote-ref role=doc-noteref>1</a></sup> The idea is to give a behind-the-scenes overview of the work that went into publishing the papers, as there are many lessons that may be useful to both PhD students and software engineers who wish to become data scientists. In addition, this website gets much more exposure than my papers ever did, so I hope that using this platform to explain the papers in a friendly language would enable a wider audience to build on my PhD work.</p><p>The title of my thesis is <em>Text Mining and Rating Prediction with Topical User Models</em>. The short, human-friendly abstract is:</p><blockquote><p>This thesis develops novel statistical methods to infer implicit information from online user-generated texts. These methods analyse texts to identify and characterise users, detect their sentiments, and predict their preferences for items such as films. The inferred information may be harnessed for improved personalisation of online user experience.</p></blockquote><p>The main publications that resulted from my PhD work are as follows. Links to posts about these publications will be added in the future. Please subscribe to get notified when this happens.</p><ul><li>Yanir Seroussi, Ingrid Zukerman, and Fabian Bohnert, &ldquo;Authorship Attribution with Topic Models&rdquo;. In <em>Computational Linguistics</em> 40(2):269–310, 2014. <a href=http://aclweb.org/anthology/J/J14/J14-2003.pdf target=_blank rel=noopener>PDF</a><br><strong>In a sentence:</strong> Essentially a condensed version of my thesis</li><li>Yanir Seroussi, &ldquo;Text Mining and Rating Prediction with Topical User Models&rdquo;. PhD thesis, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia, 2012. <a href=https://figshare.com/articles/Text_mining_and_rating_prediction_with_topical_user_models/4664473 target=_blank rel=noopener>PDF</a><br><strong>In a sentence:</strong> The thesis, as described above, which was <a href=https://www.monash.edu/news/articles/top-of-the-class target=_blank rel=noopener>awarded the Mollie Holman medal for the best thesis in the faculty of IT in 2012</a></li><li>Yanir Seroussi, Fabian Bohnert and Ingrid Zukerman, &ldquo;Authorship attribution with author-aware topic models&rdquo;. In <em>ACL 2012</em>, pages 264–269, Jeju, Republic of Korea, 2012. <a href=http://aclweb.org/anthology/P/P12/P12-2052v2.pdf target=_blank rel=noopener>PDF</a><br><strong>In a sentence:</strong> An <a href=http://en.wikipedia.org/wiki/Stylometry target=_blank rel=noopener>authorship attribution</a> model that combines <a href=http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation target=_blank rel=noopener>latent Dirichlet allocation</a> and the <a href=http://www.datalab.uci.edu/author-topic/ target=_blank rel=noopener>author-topic model</a></li><li>Yanir Seroussi, Russell Smyth and Ingrid Zukerman, &ldquo;Ghosts from the High Court’s past: Evidence from computational linguistics for Dixon ghosting for McTiernan and Rich&rdquo;. In <em>University of New South Wales Law Journal</em>, 34(3):984–1005, 2011. <a href=http://www.csse.monash.edu.au/~ingrid/Publications/SeroussiSmythZukerman.pdf target=_blank rel=noopener>PDF</a> | <a href="https://umlt.infotech.monash.edu/?page_id=152" target=_blank rel=noopener>Dataset</a><br><strong>In a sentence:</strong> A law journal paper that explores the extent to which Australian high court justice <a href=https://en.wikipedia.org/wiki/Owen_Dixon target=_blank rel=noopener>Owen Dixon</a> ghost-wrote judgements for <a href=https://en.wikipedia.org/wiki/Edward_McTiernan target=_blank rel=noopener>Edward McTiernan</a> and <a href=https://en.wikipedia.org/wiki/George_Rich target=_blank rel=noopener>George Rich</a></li><li>Yanir Seroussi, Ingrid Zukerman and Fabian Bohnert, &ldquo;Authorship attribution with latent Dirichlet allocation&rdquo;. In <em>CoNLL 2011</em>, pages 181–189, Portland, OR, USA, 2011. <a href=http://aclweb.org/anthology/W/W11/W11-0321.pdf target=_blank rel=noopener>PDF</a> | <a href=http://www.csse.monash.edu.au/research/umnl/data/umami/ target=_blank rel=noopener>Judgement dataset</a> | <a href=https://www.dropbox.com/s/np1u1hl343gd73m/imdb62.zip target=_blank rel=noopener>IMDB62 dataset</a><br><strong>In a sentence:</strong> Applying <a href=http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation target=_blank rel=noopener>latent Dirichlet allocation</a> to the <a href=http://en.wikipedia.org/wiki/Stylometry target=_blank rel=noopener>authorship attribution</a> problem</li><li>Yanir Seroussi, Fabian Bohnert and Ingrid Zukerman, &ldquo;Personalised rating prediction for new users using latent factor models&rdquo;. In <em>HT 2011</em>, pages 47–56, Eindhoven, The Netherlands, 2011. <a href=https://www.dropbox.com/s/og42a9f97dcuuyt/SeroussiBohnertZukerman2011.pdf target=_blank rel=noopener>PDF</a> | <a href=https://www.dropbox.com/s/zmev1b6c5ug5l0u/imdb1m.zip target=_blank rel=noopener>Dataset</a><br><strong>In a sentence:</strong> Extensions to the basic matrix factorisation approach to <a href=https://en.wikipedia.org/wiki/Recommender_system target=_blank rel=noopener>recommender systems</a> to handle scenarios with new users who have little data associated with them</li><li>Yanir Seroussi, Ingrid Zukerman and Fabian Bohnert, &ldquo;Collaborative inference of sentiments from texts&rdquo;. In <em>UMAP 2010</em>, pages 195–206, Waikoloa, HI, USA, 2010. <a href=https://www.dropbox.com/s/sz9uw1s5151vs5d/SeroussiZukermanBohnert2010b.pdf target=_blank rel=noopener>PDF</a> | <a href=https://www.dropbox.com/s/np1u1hl343gd73m/imdb62.zip target=_blank rel=noopener>Dataset</a> | <a href=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/ title="First steps in data science: author-aware sentiment analysis">Blog post</a><br><strong>In a sentence:</strong> An application of a model based on <a href=https://en.wikipedia.org/wiki/Collaborative_filtering#Memory-based target=_blank rel=noopener>neighbour-based collaborative filtering</a> to a variant of the <a href=https://en.wikipedia.org/wiki/Sentiment_analysis target=_blank rel=noopener>sentiment analysis</a> problem where the authors are known</li></ul><div class=footnotes role=doc-endnotes><hr><ol><li id=fn:1><p><em>July 2023 update:</em> Just noticed this plan while tidying up the website. The series of posts never got off the ground. As it&rsquo;s been eight years, I think it&rsquo;s safe to say it&rsquo;s not going to happen.&#160;<a href=#fnref:1 class=footnote-backref role=doc-backlink>&#8617;&#xfe0e;</a></p></li></ol></div></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/predictive-modelling/>Predictive Modelling</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share My PhD work on x" href="https://x.com/intent/tweet/?text=My%20PhD%20work&amp;url=https%3a%2f%2fyanirseroussi.com%2fphd-work%2f&amp;hashtags=artificialintelligence%2ccareer%2cdatascience%2cmachinelearning%2cpredictivemodelling"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My PhD work on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2fphd-work%2f&amp;title=My%20PhD%20work&amp;summary=My%20PhD%20work&amp;source=https%3a%2f%2fyanirseroussi.com%2fphd-work%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My PhD work on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2fphd-work%2f&title=My%20PhD%20work"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My PhD work on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2fphd-work%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My PhD work on whatsapp" href="https://api.whatsapp.com/send?text=My%20PhD%20work%20-%20https%3a%2f%2fyanirseroussi.com%2fphd-work%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My PhD work on telegram" href="https://telegram.me/share/url?text=My%20PhD%20work&amp;url=https%3a%2f%2fyanirseroussi.com%2fphd-work%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My PhD work on ycombinator" href="https://news.ycombinator.com/submitlink?t=My%20PhD%20work&u=https%3a%2f%2fyanirseroussi.com%2fphd-work%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
+<meta name=keywords content="artificial intelligence,career,data science,machine learning,predictive modelling"><meta name=description content="An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/phd-work/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/phd-work/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="My PhD work"><meta property="og:description" content="An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/phd-work/"><meta property="og:image" content="https://yanirseroussi.com/phd-work/thesis.jpg"><meta property="article:section" content="posts"><meta property="article:published_time" content="2015-03-30T03:23:33+00:00"><meta property="article:modified_time" content="2024-01-16T09:56:03+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/phd-work/thesis.jpg"><meta name=twitter:title content="My PhD work"><meta name=twitter:description content="An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Browse Posts","item":"https://yanirseroussi.com/posts/"},{"@type":"ListItem","position":2,"name":"My PhD work","item":"https://yanirseroussi.com/phd-work/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"My PhD work","name":"My PhD work","description":"An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.","keywords":["artificial intelligence","career","data science","machine learning","predictive modelling"],"articleBody":"I did my PhD at Monash University under the supervision of Ingrid Zukerman and Fabian Bohnert. I started in March 2009 and submitted my thesis in August 2012. When excluding time spent on conference trips and three months of an internship with Google, it took about three years of work to complete the PhD, which is not too bad for a 100% research program (no coursework was required at the time).\nPeople often ask me how to become a data scientist. The PhD was my way of doing that, though it was entirely unplanned. In fact, I didn’t even want to do a PhD. My original plan was to come to Australia, do a master degree, and see if I like it here. Ingrid convinced me to do a PhD, because “the time difference to a master isn’t huge”. I don’t regret listening to her. I had the opportunity to work on interesting problems, travel, and generally have fun. The PhD has even made me more employable due to the boom in data-driven work, which wasn’t something I was aiming for. All I was hoping to achieve was being qualified to work on more interesting stuff than vanilla software engineering, which was my focus prior to the PhD.\nBroadly speaking, the topics of the PhD were in the areas of user modelling and natural language processing. I’m planning to eventually document the journey and the work done through a series of posts.1 The idea is to give a behind-the-scenes overview of the work that went into publishing the papers, as there are many lessons that may be useful to both PhD students and software engineers who wish to become data scientists. In addition, this website gets much more exposure than my papers ever did, so I hope that using this platform to explain the papers in a friendly language would enable a wider audience to build on my PhD work.\nThe title of my thesis is Text Mining and Rating Prediction with Topical User Models. The short, human-friendly abstract is:\nThis thesis develops novel statistical methods to infer implicit information from online user-generated texts. These methods analyse texts to identify and characterise users, detect their sentiments, and predict their preferences for items such as films. The inferred information may be harnessed for improved personalisation of online user experience.\nThe main publications that resulted from my PhD work are as follows. Links to posts about these publications will be added in the future. Please subscribe to get notified when this happens.\nYanir Seroussi, Ingrid Zukerman, and Fabian Bohnert, “Authorship Attribution with Topic Models”. In Computational Linguistics 40(2):269–310, 2014. PDF\nIn a sentence: Essentially a condensed version of my thesis Yanir Seroussi, “Text Mining and Rating Prediction with Topical User Models”. PhD thesis, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia, 2012. PDF\nIn a sentence: The thesis, as described above, which was awarded the Mollie Holman medal for the best thesis in the faculty of IT in 2012 Yanir Seroussi, Fabian Bohnert and Ingrid Zukerman, “Authorship attribution with author-aware topic models”. In ACL 2012, pages 264–269, Jeju, Republic of Korea, 2012. PDF\nIn a sentence: An authorship attribution model that combines latent Dirichlet allocation and the author-topic model Yanir Seroussi, Russell Smyth and Ingrid Zukerman, “Ghosts from the High Court’s past: Evidence from computational linguistics for Dixon ghosting for McTiernan and Rich”. In University of New South Wales Law Journal, 34(3):984–1005, 2011. PDF | Dataset\nIn a sentence: A law journal paper that explores the extent to which Australian high court justice Owen Dixon ghost-wrote judgements for Edward McTiernan and George Rich Yanir Seroussi, Ingrid Zukerman and Fabian Bohnert, “Authorship attribution with latent Dirichlet allocation”. In CoNLL 2011, pages 181–189, Portland, OR, USA, 2011. PDF | Judgement dataset | IMDB62 dataset\nIn a sentence: Applying latent Dirichlet allocation to the authorship attribution problem Yanir Seroussi, Fabian Bohnert and Ingrid Zukerman, “Personalised rating prediction for new users using latent factor models”. In HT 2011, pages 47–56, Eindhoven, The Netherlands, 2011. PDF | Dataset\nIn a sentence: Extensions to the basic matrix factorisation approach to recommender systems to handle scenarios with new users who have little data associated with them Yanir Seroussi, Ingrid Zukerman and Fabian Bohnert, “Collaborative inference of sentiments from texts”. In UMAP 2010, pages 195–206, Waikoloa, HI, USA, 2010. PDF | Dataset | Blog post\nIn a sentence: An application of a model based on neighbour-based collaborative filtering to a variant of the sentiment analysis problem where the authors are known July 2023 update: Just noticed this plan while tidying up the website. The series of posts never got off the ground. As it’s been eight years, I think it’s safe to say it’s not going to happen. ↩︎\n","wordCount":"790","inLanguage":"en","image":"https://yanirseroussi.com/phd-work/thesis.jpg","datePublished":"2015-03-30T03:23:33Z","dateModified":"2024-01-16T09:56:03+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/phd-work/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">My PhD work</h1><div class=post-meta><span title='2015-03-30 03:23:33 +0000 UTC'>March 30, 2015</span></div></header><figure class=entry-cover><img loading=eager srcset="https://yanirseroussi.com/phd-work/thesis_hufffef9059c063cbf1893ec2887faca16_981984_360x0_resize_q75_box.jpg 360w ,https://yanirseroussi.com/phd-work/thesis_hufffef9059c063cbf1893ec2887faca16_981984_480x0_resize_q75_box.jpg 480w ,https://yanirseroussi.com/phd-work/thesis_hufffef9059c063cbf1893ec2887faca16_981984_720x0_resize_q75_box.jpg 720w ,https://yanirseroussi.com/phd-work/thesis_hufffef9059c063cbf1893ec2887faca16_981984_1080x0_resize_q75_box.jpg 1080w ,https://yanirseroussi.com/phd-work/thesis_hufffef9059c063cbf1893ec2887faca16_981984_1500x0_resize_q75_box.jpg 1500w ,https://yanirseroussi.com/phd-work/thesis.jpg 2592w" sizes="(min-width: 768px) 720px, 100vw" src=https://yanirseroussi.com/phd-work/thesis.jpg alt width=2592 height=1552></figure><div class=post-content><p>I did my PhD at <a href=http://www.monash.edu/ target=_blank rel=noopener>Monash University</a> under the supervision of <a href=http://users.monash.edu/~ingrid/ target=_blank rel=noopener>Ingrid Zukerman</a> and <a href=https://sites.google.com/a/bohnert.eu/fabian-bohnert/ target=_blank rel=noopener>Fabian Bohnert</a>. I started in March 2009 and submitted my thesis in August 2012. When excluding time spent on conference trips and three months of an internship with Google, it took about three years of work to complete the PhD, which is not too bad for a 100% research program (no coursework was required at the time).</p><p>People often ask me how to become <a href=https://yanirseroussi.com/2014/10/23/what-is-data-science/ title="What is data science?">a data scientist</a>. The PhD was my way of doing that, though it was entirely unplanned. In fact, I didn&rsquo;t even want to do a PhD. My original plan was to come to Australia, do a master degree, and see if I like it here. Ingrid convinced me to do a PhD, because &ldquo;the time difference to a master isn&rsquo;t huge&rdquo;. I don&rsquo;t regret listening to her. I had the opportunity to work on interesting problems, travel, and generally have fun. The PhD has even made me more employable due to the boom in data-driven work, which wasn&rsquo;t something I was aiming for. All I was hoping to achieve was being qualified to work on more interesting stuff than vanilla software engineering, which was my focus prior to the PhD.</p><p>Broadly speaking, the topics of the PhD were in the areas of user modelling and natural language processing. I&rsquo;m planning to eventually document the journey and the work done through a series of posts.<sup id=fnref:1><a href=#fn:1 class=footnote-ref role=doc-noteref>1</a></sup> The idea is to give a behind-the-scenes overview of the work that went into publishing the papers, as there are many lessons that may be useful to both PhD students and software engineers who wish to become data scientists. In addition, this website gets much more exposure than my papers ever did, so I hope that using this platform to explain the papers in a friendly language would enable a wider audience to build on my PhD work.</p><p>The title of my thesis is <em>Text Mining and Rating Prediction with Topical User Models</em>. The short, human-friendly abstract is:</p><blockquote><p>This thesis develops novel statistical methods to infer implicit information from online user-generated texts. These methods analyse texts to identify and characterise users, detect their sentiments, and predict their preferences for items such as films. The inferred information may be harnessed for improved personalisation of online user experience.</p></blockquote><p>The main publications that resulted from my PhD work are as follows. Links to posts about these publications will be added in the future. Please subscribe to get notified when this happens.</p><ul><li>Yanir Seroussi, Ingrid Zukerman, and Fabian Bohnert, &ldquo;Authorship Attribution with Topic Models&rdquo;. In <em>Computational Linguistics</em> 40(2):269–310, 2014. <a href=http://aclweb.org/anthology/J/J14/J14-2003.pdf target=_blank rel=noopener>PDF</a><br><strong>In a sentence:</strong> Essentially a condensed version of my thesis</li><li>Yanir Seroussi, &ldquo;Text Mining and Rating Prediction with Topical User Models&rdquo;. PhD thesis, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia, 2012. <a href=https://figshare.com/articles/Text_mining_and_rating_prediction_with_topical_user_models/4664473 target=_blank rel=noopener>PDF</a><br><strong>In a sentence:</strong> The thesis, as described above, which was <a href=https://www.monash.edu/news/articles/top-of-the-class target=_blank rel=noopener>awarded the Mollie Holman medal for the best thesis in the faculty of IT in 2012</a></li><li>Yanir Seroussi, Fabian Bohnert and Ingrid Zukerman, &ldquo;Authorship attribution with author-aware topic models&rdquo;. In <em>ACL 2012</em>, pages 264–269, Jeju, Republic of Korea, 2012. <a href=http://aclweb.org/anthology/P/P12/P12-2052v2.pdf target=_blank rel=noopener>PDF</a><br><strong>In a sentence:</strong> An <a href=http://en.wikipedia.org/wiki/Stylometry target=_blank rel=noopener>authorship attribution</a> model that combines <a href=http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation target=_blank rel=noopener>latent Dirichlet allocation</a> and the <a href=http://www.datalab.uci.edu/author-topic/ target=_blank rel=noopener>author-topic model</a></li><li>Yanir Seroussi, Russell Smyth and Ingrid Zukerman, &ldquo;Ghosts from the High Court’s past: Evidence from computational linguistics for Dixon ghosting for McTiernan and Rich&rdquo;. In <em>University of New South Wales Law Journal</em>, 34(3):984–1005, 2011. <a href=http://www.csse.monash.edu.au/~ingrid/Publications/SeroussiSmythZukerman.pdf target=_blank rel=noopener>PDF</a> | <a href="https://umlt.infotech.monash.edu/?page_id=152" target=_blank rel=noopener>Dataset</a><br><strong>In a sentence:</strong> A law journal paper that explores the extent to which Australian high court justice <a href=https://en.wikipedia.org/wiki/Owen_Dixon target=_blank rel=noopener>Owen Dixon</a> ghost-wrote judgements for <a href=https://en.wikipedia.org/wiki/Edward_McTiernan target=_blank rel=noopener>Edward McTiernan</a> and <a href=https://en.wikipedia.org/wiki/George_Rich target=_blank rel=noopener>George Rich</a></li><li>Yanir Seroussi, Ingrid Zukerman and Fabian Bohnert, &ldquo;Authorship attribution with latent Dirichlet allocation&rdquo;. In <em>CoNLL 2011</em>, pages 181–189, Portland, OR, USA, 2011. <a href=http://aclweb.org/anthology/W/W11/W11-0321.pdf target=_blank rel=noopener>PDF</a> | <a href=http://www.csse.monash.edu.au/research/umnl/data/umami/ target=_blank rel=noopener>Judgement dataset</a> | <a href=https://www.dropbox.com/s/np1u1hl343gd73m/imdb62.zip target=_blank rel=noopener>IMDB62 dataset</a><br><strong>In a sentence:</strong> Applying <a href=http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation target=_blank rel=noopener>latent Dirichlet allocation</a> to the <a href=http://en.wikipedia.org/wiki/Stylometry target=_blank rel=noopener>authorship attribution</a> problem</li><li>Yanir Seroussi, Fabian Bohnert and Ingrid Zukerman, &ldquo;Personalised rating prediction for new users using latent factor models&rdquo;. In <em>HT 2011</em>, pages 47–56, Eindhoven, The Netherlands, 2011. <a href=https://www.dropbox.com/s/og42a9f97dcuuyt/SeroussiBohnertZukerman2011.pdf target=_blank rel=noopener>PDF</a> | <a href=https://www.dropbox.com/s/zmev1b6c5ug5l0u/imdb1m.zip target=_blank rel=noopener>Dataset</a><br><strong>In a sentence:</strong> Extensions to the basic matrix factorisation approach to <a href=https://en.wikipedia.org/wiki/Recommender_system target=_blank rel=noopener>recommender systems</a> to handle scenarios with new users who have little data associated with them</li><li>Yanir Seroussi, Ingrid Zukerman and Fabian Bohnert, &ldquo;Collaborative inference of sentiments from texts&rdquo;. In <em>UMAP 2010</em>, pages 195–206, Waikoloa, HI, USA, 2010. <a href=https://www.dropbox.com/s/sz9uw1s5151vs5d/SeroussiZukermanBohnert2010b.pdf target=_blank rel=noopener>PDF</a> | <a href=https://www.dropbox.com/s/np1u1hl343gd73m/imdb62.zip target=_blank rel=noopener>Dataset</a> | <a href=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/ title="First steps in data science: author-aware sentiment analysis">Blog post</a><br><strong>In a sentence:</strong> An application of a model based on <a href=https://en.wikipedia.org/wiki/Collaborative_filtering#Memory-based target=_blank rel=noopener>neighbour-based collaborative filtering</a> to a variant of the <a href=https://en.wikipedia.org/wiki/Sentiment_analysis target=_blank rel=noopener>sentiment analysis</a> problem where the authors are known</li></ul><div class=footnotes role=doc-endnotes><hr><ol><li id=fn:1><p><em>July 2023 update:</em> Just noticed this plan while tidying up the website. The series of posts never got off the ground. As it&rsquo;s been eight years, I think it&rsquo;s safe to say it&rsquo;s not going to happen.&#160;<a href=#fnref:1 class=footnote-backref role=doc-backlink>&#8617;&#xfe0e;</a></p></li></ol></div></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/predictive-modelling/>Predictive Modelling</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share My PhD work on x" href="https://x.com/intent/tweet/?text=My%20PhD%20work&amp;url=https%3a%2f%2fyanirseroussi.com%2fphd-work%2f&amp;hashtags=artificialintelligence%2ccareer%2cdatascience%2cmachinelearning%2cpredictivemodelling"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My PhD work on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2fphd-work%2f&amp;title=My%20PhD%20work&amp;summary=My%20PhD%20work&amp;source=https%3a%2f%2fyanirseroussi.com%2fphd-work%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My PhD work on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2fphd-work%2f&title=My%20PhD%20work"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My PhD work on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2fphd-work%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My PhD work on whatsapp" href="https://api.whatsapp.com/send?text=My%20PhD%20work%20-%20https%3a%2f%2fyanirseroussi.com%2fphd-work%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My PhD work on telegram" href="https://telegram.me/share/url?text=My%20PhD%20work&amp;url=https%3a%2f%2fyanirseroussi.com%2fphd-work%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share My PhD work on ycombinator" href="https://news.ycombinator.com/submitlink?t=My%20PhD%20work&u=https%3a%2f%2fyanirseroussi.com%2fphd-work%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
       <a href=https://github.com/adityatelange/hugo-PaperMod/ rel=noopener target=_blank>PaperMod</a></span></div></div><script>const menuTrigger=document.querySelector("#menu-trigger"),menuElem=document.querySelector(".menu");menuTrigger.addEventListener("click",function(){menuElem.classList.toggle("hidden")}),document.body.addEventListener("click",function(e){menuTrigger.contains(e.target)||menuElem.classList.add("hidden")})</script><script>let menu=document.getElementById("menu");menu&&(menu.scrollLeft=localStorage.getItem("menu-scroll-position"),menu.onscroll=function(){localStorage.setItem("menu-scroll-position",menu.scrollLeft)}),document.querySelectorAll('a[href^="#"]').forEach(e=>{e.addEventListener("click",function(e){e.preventDefault();var t=this.getAttribute("href").substr(1);window.matchMedia("(prefers-reduced-motion: reduce)").matches?document.querySelector(`[id='${decodeURIComponent(t)}']`).scrollIntoView():document.querySelector(`[id='${decodeURIComponent(t)}']`).scrollIntoView({behavior:"smooth"}),t==="top"?history.replaceState(null,null," "):history.pushState(null,null,`#${t}`)})})</script><script>document.getElementById("theme-toggle").addEventListener("click",()=>{document.body.className.includes("dark")?(document.body.classList.remove("dark"),localStorage.setItem("pref-theme","light")):(document.body.classList.add("dark"),localStorage.setItem("pref-theme","dark"))})</script></body></html>
\ No newline at end of file
diff --git a/posts/index.html b/posts/index.html
index f31177e64..1707c9cd2 100644
--- a/posts/index.html
+++ b/posts/index.html
@@ -2,7 +2,7 @@
 <meta name=keywords content><meta name=description content="Browse my main posts in reverse chronological order below, or [by tag](/tags/). You may also want to check out my
 shorter-form [TIL (today I learned) posts](/til/).
 {{<subscribe_form>}}
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/posts/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/posts/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/posts/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Browse Posts"><meta property="og:description" content="Browse my main posts in reverse chronological order below, or [by tag](/tags/). You may also want to check out my
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/posts/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/posts/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/posts/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Browse Posts"><meta property="og:description" content="Browse my main posts in reverse chronological order below, or [by tag](/tags/). You may also want to check out my
 shorter-form [TIL (today I learned) posts](/til/).
 {{<subscribe_form>}}
 "><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/posts/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Browse Posts"><meta name=twitter:description content="Browse my main posts in reverse chronological order below, or [by tag](/tags/). You may also want to check out my
diff --git a/sitemap.xml b/sitemap.xml
index 293cc2f77..7205d0299 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>https://yanirseroussi.com/tags/business/</loc><lastmod>2024-06-26T10:45:15+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/career/</loc><lastmod>2024-06-26T10:45:15+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/</loc><lastmod>2024-06-26T10:45:15+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/quotes/</loc><lastmod>2024-06-26T10:45:15+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/startups/</loc><lastmod>2024-06-26T10:45:15+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/</loc><lastmod>2024-06-26T10:45:15+10:00</lastmod></url><url><loc>https://yanirseroussi.com/</loc><lastmod>2024-06-26T15:02:31+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/analytics/</loc><lastmod>2024-06-24T14:12:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/artificial-intelligence/</loc><lastmod>2024-06-24T14:12:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/data-science/</loc><lastmod>2024-06-24T14:12:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/data-strategy/</loc><lastmod>2024-06-24T14:12:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/</loc><lastmod>2024-06-24T14:12:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/machine-learning/</loc><lastmod>2024-06-24T14:12:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/software-engineering/</loc><lastmod>2024-06-24T14:12:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/</loc><lastmod>2024-06-23T08:52:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/devops/</loc><lastmod>2024-06-23T08:52:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/</loc><lastmod>2024-06-17T13:13:44+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/books/</loc><lastmod>2024-06-12T12:58:06+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/</loc><lastmod>2024-06-12T12:58:06+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/</loc><lastmod>2024-06-10T14:23:12+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/</loc><lastmod>2024-06-03T12:58:00+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/data-engineering/</loc><lastmod>2024-05-27T12:25:30+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/</loc><lastmod>2024-05-27T12:25:30+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/</loc><lastmod>2024-05-25T10:00:56+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/futurism/</loc><lastmod>2024-05-25T10:00:56+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/</loc><lastmod>2024-05-21T17:08:32+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/</loc><lastmod>2024-05-13T12:41:01+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/</loc><lastmod>2024-05-06T14:41:43+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/consulting/</loc><lastmod>2024-04-29T17:25:28+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/</loc><lastmod>2024-04-29T17:25:28+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/personal/</loc><lastmod>2024-04-29T17:25:28+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/</loc><lastmod>2024-04-22T17:38:21+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/</loc><lastmod>2024-04-15T15:54:17+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/linkedin/</loc><lastmod>2024-04-11T13:42:58+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/</loc><lastmod>2024-04-11T13:42:58+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/marketing/</loc><lastmod>2024-04-11T13:42:58+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/climate-change/</loc><lastmod>2024-04-08T12:13:47+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/environment/</loc><lastmod>2024-04-08T12:13:47+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/</loc><lastmod>2024-04-08T12:13:47+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/remote-work/</loc><lastmod>2024-04-08T12:13:47+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/</loc><lastmod>2024-04-05T11:23:38+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/</loc><lastmod>2024-04-01T17:02:44+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/marine-science/</loc><lastmod>2024-04-01T17:02:44+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/reef-life-survey/</loc><lastmod>2024-04-01T17:02:44+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/</loc><lastmod>2024-03-12T16:33:48+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/productivity/</loc><lastmod>2024-03-12T16:33:48+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/</loc><lastmod>2024-03-11T15:53:13+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/</loc><lastmod>2024-03-05T08:47:19+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/</loc><lastmod>2024-03-04T12:39:10+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/</loc><lastmod>2024-02-19T11:25:54+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/</loc><lastmod>2024-02-17T12:34:00+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/</loc><lastmod>2024-02-13T08:24:54+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/</loc><lastmod>2024-02-06T16:39:35+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/</loc><lastmod>2024-02-19T11:25:54+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/blogging/</loc><lastmod>2024-01-19T16:35:09+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/</loc><lastmod>2024-01-19T16:35:09+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/</loc><lastmod>2024-01-09T13:23:28+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/</loc><lastmod>2024-01-08T16:31:22+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/data-business/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/</loc><lastmod>2023-12-18T10:38:56+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/energy-markets/</loc><lastmod>2023-12-14T10:46:41+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/</loc><lastmod>2023-12-14T10:46:41+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/data-visualisation/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/web-development/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/</loc><lastmod>2024-03-12T16:33:31+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/</loc><lastmod>2023-11-21T16:12:27+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/</loc><lastmod>2023-10-06T15:11:27+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/ethics/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/</loc><lastmod>2023-09-25T11:15:26+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/</loc><lastmod>2023-09-22T07:54:13+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/</loc><lastmod>2024-03-12T16:33:31+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/</loc><lastmod>2024-03-12T16:33:31+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/</loc><lastmod>2024-03-12T16:33:31+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/</loc><lastmod>2023-08-14T15:44:21+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/</loc><lastmod>2023-08-11T14:35:20+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/github/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/security/</loc><lastmod>2023-07-25T09:30:43+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/</loc><lastmod>2023-07-25T09:30:43+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/hugo/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/</loc><lastmod>2023-07-17T17:18:06+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/</loc><lastmod>2024-03-12T16:33:31+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/hackers/</loc><lastmod>2024-06-19T17:03:21+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/</loc><lastmod>2024-06-19T17:03:21+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/machine-intelligence/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/causal-inference/</loc><lastmod>2024-02-21T11:52:55+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/automattic/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/orkestra/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/politics/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/sustainability/</loc><lastmod>2024-02-21T11:52:55+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/deep-learning/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/fast.ai/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/</loc><lastmod>2024-02-21T11:52:55+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/split-testing/</loc><lastmod>2024-02-21T11:52:55+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/statistics/</loc><lastmod>2024-05-06T16:35:22+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/cloudflare/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/wordpress/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2021/10/07/my-work-with-automattic/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/</loc><lastmod>2024-02-21T11:52:55+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/bootstrapping/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/confidence-intervals/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/</loc><lastmod>2024-05-06T16:35:22+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/frequently-asked-questions/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/bandcamp/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/bcrecommender/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/elasticsearch/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/javascript/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/predictive-modelling/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/science-communication/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/search-engine-optimisation/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/insights/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/</loc><lastmod>2024-02-21T11:52:55+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/economics/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/scuba-diving/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/facebook/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/kaggle/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/mongodb/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/health/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/nutrition/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/nutritionism/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/recommender-systems/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/07/31/goodbye-parse-com/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/parse.com/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/deep-learning-resources/</loc><lastmod>2021-11-09T15:38:25+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/sentiment-analysis/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/divestment/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/fossil-fuels/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/phd-work/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/gradient-boosting/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/kaggle-competition/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/phantomjs/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/scikit-learn/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/</loc><lastmod>2023-07-06T09:28:02+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/traction-book/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/price-forecasting/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/music/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/10/23/what-is-data-science/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/multi-label-classification/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/</loc><lastmod>2023-07-06T09:28:02+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/</loc><lastmod>2023-07-06T09:28:02+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/music-industry/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/</loc><lastmod>2023-07-06T09:28:02+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/kaggle-beginners/</loc><lastmod>2023-07-06T09:28:02+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/kaggle/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/</loc><lastmod>2023-07-06T09:28:02+10:00</lastmod></url><url><loc>https://yanirseroussi.com/about/</loc><lastmod>2024-06-26T14:12:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/free-intro-call/</loc><lastmod>2024-06-26T12:57:51+10:00</lastmod></url><url><loc>https://yanirseroussi.com/posts/</loc><lastmod>2024-05-09T10:03:31+10:00</lastmod></url><url><loc>https://yanirseroussi.com/causal-inference-resources/</loc><lastmod>2023-07-06T16:01:57+10:00</lastmod></url><url><loc>https://yanirseroussi.com/data-to-ai-health-check/</loc><lastmod>2024-06-26T12:57:51+10:00</lastmod></url><url><loc>https://yanirseroussi.com/consult/</loc><lastmod>2024-06-26T15:02:31+10:00</lastmod></url><url><loc>https://yanirseroussi.com/contact/</loc><lastmod>2024-05-23T15:31:11+10:00</lastmod></url><url><loc>https://yanirseroussi.com/talks/</loc><lastmod>2024-05-06T16:35:22+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/</loc><lastmod>2024-05-09T10:03:31+10:00</lastmod></url></urlset>
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>https://yanirseroussi.com/tags/business/</loc><lastmod>2024-06-26T10:45:15+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/career/</loc><lastmod>2024-06-26T10:45:15+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/</loc><lastmod>2024-06-26T10:45:15+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/quotes/</loc><lastmod>2024-06-26T10:45:15+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/startups/</loc><lastmod>2024-06-26T10:45:15+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/</loc><lastmod>2024-06-26T10:45:15+10:00</lastmod></url><url><loc>https://yanirseroussi.com/</loc><lastmod>2024-06-27T09:32:12+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/analytics/</loc><lastmod>2024-06-24T14:12:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/artificial-intelligence/</loc><lastmod>2024-06-24T14:12:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/data-science/</loc><lastmod>2024-06-24T14:12:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/data-strategy/</loc><lastmod>2024-06-24T14:12:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/</loc><lastmod>2024-06-24T14:12:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/machine-learning/</loc><lastmod>2024-06-24T14:12:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/software-engineering/</loc><lastmod>2024-06-24T14:12:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/</loc><lastmod>2024-06-23T08:52:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/devops/</loc><lastmod>2024-06-23T08:52:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/</loc><lastmod>2024-06-17T13:13:44+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/books/</loc><lastmod>2024-06-12T12:58:06+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/</loc><lastmod>2024-06-12T12:58:06+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/</loc><lastmod>2024-06-10T14:23:12+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/</loc><lastmod>2024-06-03T12:58:00+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/data-engineering/</loc><lastmod>2024-05-27T12:25:30+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/</loc><lastmod>2024-05-27T12:25:30+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/</loc><lastmod>2024-05-25T10:00:56+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/futurism/</loc><lastmod>2024-05-25T10:00:56+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/</loc><lastmod>2024-05-21T17:08:32+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/</loc><lastmod>2024-05-13T12:41:01+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/</loc><lastmod>2024-05-06T14:41:43+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/consulting/</loc><lastmod>2024-04-29T17:25:28+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/</loc><lastmod>2024-04-29T17:25:28+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/personal/</loc><lastmod>2024-04-29T17:25:28+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/</loc><lastmod>2024-04-22T17:38:21+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/</loc><lastmod>2024-04-15T15:54:17+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/linkedin/</loc><lastmod>2024-04-11T13:42:58+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/</loc><lastmod>2024-04-11T13:42:58+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/marketing/</loc><lastmod>2024-04-11T13:42:58+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/climate-change/</loc><lastmod>2024-04-08T12:13:47+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/environment/</loc><lastmod>2024-04-08T12:13:47+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/</loc><lastmod>2024-04-08T12:13:47+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/remote-work/</loc><lastmod>2024-04-08T12:13:47+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/</loc><lastmod>2024-04-05T11:23:38+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/</loc><lastmod>2024-04-01T17:02:44+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/marine-science/</loc><lastmod>2024-04-01T17:02:44+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/reef-life-survey/</loc><lastmod>2024-04-01T17:02:44+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/</loc><lastmod>2024-03-12T16:33:48+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/productivity/</loc><lastmod>2024-03-12T16:33:48+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/</loc><lastmod>2024-03-11T15:53:13+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/</loc><lastmod>2024-03-05T08:47:19+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/</loc><lastmod>2024-03-04T12:39:10+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/</loc><lastmod>2024-02-19T11:25:54+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/</loc><lastmod>2024-02-17T12:34:00+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/</loc><lastmod>2024-02-13T08:24:54+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/</loc><lastmod>2024-02-06T16:39:35+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/</loc><lastmod>2024-02-19T11:25:54+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/blogging/</loc><lastmod>2024-01-19T16:35:09+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/</loc><lastmod>2024-01-19T16:35:09+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/</loc><lastmod>2024-01-09T13:23:28+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/</loc><lastmod>2024-01-08T16:31:22+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/data-business/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/</loc><lastmod>2023-12-18T10:38:56+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/energy-markets/</loc><lastmod>2023-12-14T10:46:41+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/</loc><lastmod>2023-12-14T10:46:41+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/data-visualisation/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/web-development/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/</loc><lastmod>2024-03-12T16:33:31+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/</loc><lastmod>2023-11-21T16:12:27+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/</loc><lastmod>2023-10-06T15:11:27+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/ethics/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/</loc><lastmod>2023-09-25T11:15:26+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/</loc><lastmod>2023-09-22T07:54:13+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/</loc><lastmod>2024-03-12T16:33:31+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/</loc><lastmod>2024-03-12T16:33:31+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/</loc><lastmod>2024-03-12T16:33:31+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/</loc><lastmod>2023-08-14T15:44:21+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/</loc><lastmod>2023-08-11T14:35:20+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/github/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/security/</loc><lastmod>2023-07-25T09:30:43+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/</loc><lastmod>2023-07-25T09:30:43+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/hugo/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/</loc><lastmod>2023-07-17T17:18:06+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/</loc><lastmod>2024-03-12T16:33:31+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/hackers/</loc><lastmod>2024-06-19T17:03:21+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/</loc><lastmod>2024-06-19T17:03:21+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/machine-intelligence/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/causal-inference/</loc><lastmod>2024-02-21T11:52:55+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/automattic/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/orkestra/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/politics/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/sustainability/</loc><lastmod>2024-02-21T11:52:55+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/deep-learning/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/fast.ai/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/</loc><lastmod>2024-02-21T11:52:55+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/split-testing/</loc><lastmod>2024-02-21T11:52:55+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/statistics/</loc><lastmod>2024-05-06T16:35:22+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/cloudflare/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/wordpress/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2021/10/07/my-work-with-automattic/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/</loc><lastmod>2024-02-21T11:52:55+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/bootstrapping/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/confidence-intervals/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/</loc><lastmod>2024-05-06T16:35:22+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/frequently-asked-questions/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/bandcamp/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/bcrecommender/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/elasticsearch/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/javascript/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/predictive-modelling/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/science-communication/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/search-engine-optimisation/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/insights/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/</loc><lastmod>2024-02-21T11:52:55+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/economics/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/scuba-diving/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/facebook/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/kaggle/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/mongodb/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/health/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/nutrition/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/nutritionism/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/recommender-systems/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/07/31/goodbye-parse-com/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/parse.com/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/deep-learning-resources/</loc><lastmod>2021-11-09T15:38:25+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/sentiment-analysis/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/divestment/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/fossil-fuels/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/phd-work/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/gradient-boosting/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/kaggle-competition/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/phantomjs/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/scikit-learn/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/</loc><lastmod>2023-07-06T09:28:02+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/traction-book/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/price-forecasting/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/music/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/10/23/what-is-data-science/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/multi-label-classification/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/</loc><lastmod>2023-07-06T09:28:02+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/</loc><lastmod>2023-07-06T09:28:02+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/music-industry/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/</loc><lastmod>2023-07-06T09:28:02+10:00</lastmod></url><url><loc>https://yanirseroussi.com/tags/kaggle-beginners/</loc><lastmod>2023-07-06T09:28:02+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/kaggle/</loc><lastmod>2024-01-16T09:56:03+10:00</lastmod></url><url><loc>https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/</loc><lastmod>2023-07-06T09:28:02+10:00</lastmod></url><url><loc>https://yanirseroussi.com/about/</loc><lastmod>2024-06-26T14:12:50+10:00</lastmod></url><url><loc>https://yanirseroussi.com/free-intro-call/</loc><lastmod>2024-06-26T12:57:51+10:00</lastmod></url><url><loc>https://yanirseroussi.com/posts/</loc><lastmod>2024-05-09T10:03:31+10:00</lastmod></url><url><loc>https://yanirseroussi.com/causal-inference-resources/</loc><lastmod>2023-07-06T16:01:57+10:00</lastmod></url><url><loc>https://yanirseroussi.com/data-to-ai-health-check/</loc><lastmod>2024-06-26T12:57:51+10:00</lastmod></url><url><loc>https://yanirseroussi.com/consult/</loc><lastmod>2024-06-26T15:02:31+10:00</lastmod></url><url><loc>https://yanirseroussi.com/talks/</loc><lastmod>2024-06-27T09:32:12+10:00</lastmod></url><url><loc>https://yanirseroussi.com/contact/</loc><lastmod>2024-05-23T15:31:11+10:00</lastmod></url><url><loc>https://yanirseroussi.com/til/</loc><lastmod>2024-05-09T10:03:31+10:00</lastmod></url></urlset>
\ No newline at end of file
diff --git a/tags/analytics/index.html b/tags/analytics/index.html
index cae49ddf5..a6111f4ae 100644
--- a/tags/analytics/index.html
+++ b/tags/analytics/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Analytics | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/analytics/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/analytics/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/analytics/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Analytics"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/analytics/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Analytics"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/analytics/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/analytics/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/analytics/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Analytics"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/analytics/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Analytics"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Analytics</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is your tech stack ready for data-intensive applications?</h2></header><div class=entry-content><p>Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.</p></div><footer class=entry-footer><span title='2024-06-24 02:00:00 +0000 UTC'>June 24, 2024</span></footer><a class=entry-link aria-label="post link to Is your tech stack ready for data-intensive applications?" href=https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Startup data health starts with healthy event tracking</h2></header><div class=entry-content><p>Expanding on the startup health check question of tracking Kukuyeva’s five business aspects as wide events.</p></div><footer class=entry-footer><span title='2024-06-10 04:00:00 +0000 UTC'>June 10, 2024</span></footer><a class=entry-link aria-label="post link to Startup data health starts with healthy event tracking" href=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Assessing a startup's data-to-AI health</h2></header><div class=entry-content><p>Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.</p></div><footer class=entry-footer><span title='2024-04-22 06:00:00 +0000 UTC'>April 22, 2024</span></footer><a class=entry-link aria-label="post link to Assessing a startup's data-to-AI health" href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Substance over titles: Your first data hire may be a data scientist</h2></header><div class=entry-content><p>Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people.</p></div><footer class=entry-footer><span title='2024-02-05 02:45:00 +0000 UTC'>February 5, 2024</span></footer><a class=entry-link aria-label="post link to Substance over titles: Your first data hire may be a data scientist" href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Bootstrapping the right way?</h2></header><div class=entry-content><p>Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.</p></div><footer class=entry-footer><span title='2019-10-06 06:48:07 +0000 UTC'>October 6, 2019</span></footer><a class=entry-link aria-label="post link to Bootstrapping the right way?" href=https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Defining data science in 2018</h2></header><div class=entry-content><p>Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.</p></div><footer class=entry-footer><span title='2018-07-22 08:27:43 +0000 UTC'>July 22, 2018</span></footer><a class=entry-link aria-label="post link to Defining data science in 2018" href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Customer lifetime value and the proliferation of misinformation on the internet</h2></header><div class=entry-content><p>There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.</p></div><footer class=entry-footer><span title='2017-01-08 20:02:30 +0000 UTC'>January 8, 2017</span></footer><a class=entry-link aria-label="post link to Customer lifetime value and the proliferation of misinformation on the internet" href=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>If you don’t pay attention, data can drive you off a cliff</h2></header><div class=entry-content><p>Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.</p></div><footer class=entry-footer><span title='2016-08-21 21:34:17 +0000 UTC'>August 21, 2016</span></footer><a class=entry-link aria-label="post link to If you don’t pay attention, data can drive you off a cliff" href=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Making Bayesian A/B testing more accessible</h2></header><div class=entry-content><p>A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.</p></div><footer class=entry-footer><span title='2016-06-19 10:32:15 +0000 UTC'>June 19, 2016</span></footer><a class=entry-link aria-label="post link to Making Bayesian A/B testing more accessible" href=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Why you should stop worrying about deep learning and deepen your understanding of causality instead</h2></header><div class=entry-content><p>Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.</p></div><footer class=entry-footer><span title='2016-02-14 11:04:11 +0000 UTC'>February 14, 2016</span></footer><a class=entry-link aria-label="post link to Why you should stop worrying about deep learning and deepen your understanding of causality instead" href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>This holiday season, give me real insights</h2></header><div class=entry-content><p>Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights.</p></div><footer class=entry-footer><span title='2015-12-08 06:57:25 +0000 UTC'>December 8, 2015</span></footer><a class=entry-link aria-label="post link to This holiday season, give me real insights" href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/artificial-intelligence/index.html b/tags/artificial-intelligence/index.html
index e42b8c6ab..35d13fabf 100644
--- a/tags/artificial-intelligence/index.html
+++ b/tags/artificial-intelligence/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Artificial Intelligence | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/artificial-intelligence/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/artificial-intelligence/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/artificial-intelligence/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Artificial Intelligence"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/artificial-intelligence/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Artificial Intelligence"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/artificial-intelligence/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/artificial-intelligence/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/artificial-intelligence/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Artificial Intelligence"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/artificial-intelligence/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Artificial Intelligence"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Artificial Intelligence</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is your tech stack ready for data-intensive applications?</h2></header><div class=entry-content><p>Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.</p></div><footer class=entry-footer><span title='2024-06-24 02:00:00 +0000 UTC'>June 24, 2024</span></footer><a class=entry-link aria-label="post link to Is your tech stack ready for data-intensive applications?" href=https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Dealing with endless data changes</h2></header><div class=entry-content><p>Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data.</p></div><footer class=entry-footer><span title='2024-06-22 22:50:00 +0000 UTC'>June 22, 2024</span></footer><a class=entry-link aria-label="post link to Dealing with endless data changes" href=https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>AI ain't gonna save you from bad data</h2></header><div class=entry-content><p>Since we’re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects.</p></div><footer class=entry-footer><span title='2024-06-17 02:00:00 +0000 UTC'>June 17, 2024</span></footer><a class=entry-link aria-label="post link to AI ain't gonna save you from bad data" href=https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Plumbing, Decisions, and Automation: De-hyping Data & AI</h2></header><div class=entry-content><p>Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype).</p></div><footer class=entry-footer><span title='2024-05-27 02:00:00 +0000 UTC'>May 27, 2024</span></footer><a class=entry-link aria-label="post link to Plumbing, Decisions, and Automation: De-hyping Data & AI" href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Adapting to the economy of algorithms</h2></header><div class=entry-content><p>Overview of the book The Economy of Algorithms by Marek Kowalkiewicz.</p></div><footer class=entry-footer><span title='2024-05-25 00:00:00 +0000 UTC'>May 25, 2024</span></footer><a class=entry-link aria-label="post link to Adapting to the economy of algorithms" href=https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Assessing a startup's data-to-AI health</h2></header><div class=entry-content><p>Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.</p></div><footer class=entry-footer><span title='2024-04-22 06:00:00 +0000 UTC'>April 22, 2024</span></footer><a class=entry-link aria-label="post link to Assessing a startup's data-to-AI health" href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>AI does not obviate the need for testing and observability</h2></header><div class=entry-content><p>It’s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software.</p></div><footer class=entry-footer><span title='2024-04-15 05:00:00 +0000 UTC'>April 15, 2024</span></footer><a class=entry-link aria-label="post link to AI does not obviate the need for testing and observability" href=https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Artificial intelligence, automation, and the art of counting fish</h2></header><div class=entry-content><p>Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement.</p></div><footer class=entry-footer><span title='2024-04-01 06:00:00 +0000 UTC'>April 1, 2024</span></footer><a class=entry-link aria-label="post link to Artificial intelligence, automation, and the art of counting fish" href=https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Questions to consider when using AI for PDF data extraction</h2></header><div class=entry-content><p>Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.</p></div><footer class=entry-footer><span title='2024-03-11 00:00:00 +0000 UTC'>March 11, 2024</span></footer><a class=entry-link aria-label="post link to Questions to consider when using AI for PDF data extraction" href=https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Two types of startup data problems</h2></header><div class=entry-content><p>Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they’re likely to face.</p></div><footer class=entry-footer><span title='2024-03-04 02:00:00 +0000 UTC'>March 4, 2024</span></footer><a class=entry-link aria-label="post link to Two types of startup data problems" href=https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Avoiding AI complexity: First, write no code</h2></header><div class=entry-content><p>Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.</p></div><footer class=entry-footer><span title='2024-02-26 01:45:00 +0000 UTC'>February 26, 2024</span></footer><a class=entry-link aria-label="post link to Avoiding AI complexity: First, write no code" href=https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Nudging ChatGPT to invent books you have no time to read</h2></header><div class=entry-content><p>Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities.</p></div><footer class=entry-footer><span title='2024-02-12 05:00:00 +0000 UTC'>February 12, 2024</span></footer><a class=entry-link aria-label="post link to Nudging ChatGPT to invent books you have no time to read" href=https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Future software development may require fewer humans</h2></header><div class=entry-content><p>Reflecting on an interview with Jason Warner, CEO of poolside.</p></div><footer class=entry-footer><span title='2024-02-06 06:15:00 +0000 UTC'>February 6, 2024</span></footer><a class=entry-link aria-label="post link to Future software development may require fewer humans" href=https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>New decade, new tagline: Data & AI for Impact</h2></header><div class=entry-content><p>Shifting focus to ‘Data & AI for Impact’, with more startup-related content, increased posting frequency, and deeper audience engagement.</p></div><footer class=entry-footer><span title='2024-01-19 00:00:00 +0000 UTC'>January 19, 2024</span></footer><a class=entry-link aria-label="post link to New decade, new tagline: Data & AI for Impact" href=https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Artificial intelligence was a marketing term all along – just call it automation</h2></header><div class=entry-content><p>Replacing ‘artificial intelligence’ with ‘automation’ is a useful trick for cutting through the hype.</p></div><footer class=entry-footer><span title='2023-10-06 05:00:00 +0000 UTC'>October 6, 2023</span></footer><a class=entry-link aria-label="post link to Artificial intelligence was a marketing term all along – just call it automation" href=https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Google's Rules of Machine Learning still apply in the age of large language models</h2></header><div class=entry-content><p>Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices.</p></div><footer class=entry-footer><span title='2023-09-21 21:30:00 +0000 UTC'>September 21, 2023</span></footer><a class=entry-link aria-label="post link to Google's Rules of Machine Learning still apply in the age of large language models" href=https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Was data science a failure mode of software engineering?</h2></header><div class=entry-content><p>Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.</p></div><footer class=entry-footer><span title='2023-06-30 00:06:30 +0000 UTC'>June 30, 2023</span></footer><a class=entry-link aria-label="post link to Was data science a failure mode of software engineering?" href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>How hackable are automated coding assessments?</h2></header><div class=entry-content><p>Exploring the hackability of speed-based coding tests, using CodeSignal’s Industry Coding Framework as a case study.</p></div><footer class=entry-footer><span title='2023-05-26 00:03:00 +0000 UTC'>May 26, 2023</span></footer><a class=entry-link aria-label="post link to How hackable are automated coding assessments?" href=https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Remaining relevant as a small language model</h2></header><div class=entry-content><p>Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now).</p></div><footer class=entry-footer><span title='2023-04-21 00:06:30 +0000 UTC'>April 21, 2023</span></footer><a class=entry-link aria-label="post link to Remaining relevant as a small language model" href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>ChatGPT is transformative AI</h2></header><div class=entry-content><p>My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it’s only the beginning.</p></div><footer class=entry-footer><span title='2022-12-11 00:00:00 +0000 UTC'>December 11, 2022</span></footer><a class=entry-link aria-label="post link to ChatGPT is transformative AI" href=https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Causal Machine Learning is off to a good start, despite some issues</h2></header><div class=entry-content><p>Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.</p></div><footer class=entry-footer><span title='2022-09-12 02:45:00 +0000 UTC'>September 12, 2022</span></footer><a class=entry-link aria-label="post link to Causal Machine Learning is off to a good start, despite some issues" href=https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building useful machine learning tools keeps getting easier: A fish ID case study</h2></header><div class=entry-content><p>Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.</p></div><footer class=entry-footer><span title='2022-03-20 04:30:00 +0000 UTC'>March 20, 2022</span></footer><a class=entry-link aria-label="post link to Building useful machine learning tools keeps getting easier: A fish ID case study" href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Use your human brain to avoid artificial intelligence disasters</h2></header><div class=entry-content><p>Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.</p></div><footer class=entry-footer><span title='2021-11-22 03:45:00 +0000 UTC'>November 22, 2021</span></footer><a class=entry-link aria-label="post link to Use your human brain to avoid artificial intelligence disasters" href=https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Defining data science in 2018</h2></header><div class=entry-content><p>Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.</p></div><footer class=entry-footer><span title='2018-07-22 08:27:43 +0000 UTC'>July 22, 2018</span></footer><a class=entry-link aria-label="post link to Defining data science in 2018" href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My PhD work</h2></header><div class=entry-content><p>An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.</p></div><footer class=entry-footer><span title='2015-03-30 03:23:33 +0000 UTC'>March 30, 2015</span></footer><a class=entry-link aria-label="post link to My PhD work" href=https://yanirseroussi.com/phd-work/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/automattic/index.html b/tags/automattic/index.html
index 35c5213c1..846f41cc7 100644
--- a/tags/automattic/index.html
+++ b/tags/automattic/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Automattic | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/automattic/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/automattic/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/automattic/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Automattic"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/automattic/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Automattic"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/automattic/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/automattic/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/automattic/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Automattic"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/automattic/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Automattic"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Automattic</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The mission matters: Moving to climate tech as a data scientist</h2></header><div class=entry-content><p>Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.</p></div><footer class=entry-footer><span title='2022-06-06 00:00:00 +0000 UTC'>June 6, 2022</span></footer><a class=entry-link aria-label="post link to The mission matters: Moving to climate tech as a data scientist" href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My work with Automattic</h2></header><div class=entry-content><p>Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.</p></div><footer class=entry-footer><span title='2021-10-07 00:00:00 +0000 UTC'>October 7, 2021</span></footer><a class=entry-link aria-label="post link to My work with Automattic" href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>A day in the life of a remote data scientist</h2></header><div class=entry-content><p>Video of a talk I gave on remote data science work at the Data Science Sydney meetup.</p></div><footer class=entry-footer><span title='2019-12-11 22:06:19 +0000 UTC'>December 11, 2019</span></footer><a class=entry-link aria-label="post link to A day in the life of a remote data scientist" href=https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Reflections on remote data science work</h2></header><div class=entry-content><p>Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist.</p></div><footer class=entry-footer><span title='2018-11-03 06:33:13 +0000 UTC'>November 3, 2018</span></footer><a class=entry-link aria-label="post link to Reflections on remote data science work" href=https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My 10-step path to becoming a remote data scientist with Automattic</h2></header><div class=entry-content><p>I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.</p></div><footer class=entry-footer><span title='2017-07-29 05:39:26 +0000 UTC'>July 29, 2017</span></footer><a class=entry-link aria-label="post link to My 10-step path to becoming a remote data scientist with Automattic" href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/bandcamp/index.html b/tags/bandcamp/index.html
index 171e2b000..e3c38dfbd 100644
--- a/tags/bandcamp/index.html
+++ b/tags/bandcamp/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Bandcamp | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/bandcamp/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/bandcamp/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/bandcamp/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Bandcamp"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/bandcamp/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Bandcamp"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/bandcamp/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/bandcamp/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/bandcamp/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Bandcamp"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/bandcamp/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Bandcamp"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Bandcamp</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>State of Bandcamp Recommender, Late 2017</h2></header><div class=entry-content><p>Call for BCRecommender maintainers followed by a decision to shut it down, as I don’t have enough time and Bandcamp now offers recommendations.</p></div><footer class=entry-footer><span title='2017-09-02 10:19:02 +0000 UTC'>September 2, 2017</span></footer><a class=entry-link aria-label="post link to State of Bandcamp Recommender, Late 2017" href=https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Hopping on the deep learning bandwagon</h2></header><div class=entry-content><p>To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.</p></div><footer class=entry-footer><span title='2015-06-06 05:00:22 +0000 UTC'>June 6, 2015</span></footer><a class=entry-link aria-label="post link to Hopping on the deep learning bandwagon" href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>BCRecommender Traction Update</h2></header><div class=entry-content><p>Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing.</p></div><footer class=entry-footer><span title='2014-11-05 02:29:35 +0000 UTC'>November 5, 2014</span></footer><a class=entry-link aria-label="post link to BCRecommender Traction Update" href=https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Applying the Traction Book’s Bullseye framework to BCRecommender</h2></header><div class=entry-content><p>Ranking 19 channels with the goal of getting traction for BCRecommender.</p></div><footer class=entry-footer><span title='2014-09-24 04:57:39 +0000 UTC'>September 24, 2014</span></footer><a class=entry-link aria-label="post link to Applying the Traction Book’s Bullseye framework to BCRecommender" href=https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Bandcamp recommendation and discovery algorithms</h2></header><div class=entry-content><p>The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.</p></div><footer class=entry-footer><span title='2014-09-19 14:26:55 +0000 UTC'>September 19, 2014</span></footer><a class=entry-link aria-label="post link to Bandcamp recommendation and discovery algorithms" href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)</h2></header><div class=entry-content><p>Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service.</p></div><footer class=entry-footer><span title='2014-09-07 10:48:44 +0000 UTC'>September 7, 2014</span></footer><a class=entry-link aria-label="post link to Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)" href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building a Bandcamp recommender system (part 1 – motivation)</h2></header><div class=entry-content><p>My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music.</p></div><footer class=entry-footer><span title='2014-08-30 08:11:38 +0000 UTC'>August 30, 2014</span></footer><a class=entry-link aria-label="post link to Building a Bandcamp recommender system (part 1 – motivation)" href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/bcrecommender/index.html b/tags/bcrecommender/index.html
index 168ca7feb..d40624f5e 100644
--- a/tags/bcrecommender/index.html
+++ b/tags/bcrecommender/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>BCRecommender | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/bcrecommender/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/bcrecommender/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/bcrecommender/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="BCRecommender"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/bcrecommender/"><meta name=twitter:card content="summary"><meta name=twitter:title content="BCRecommender"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/bcrecommender/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/bcrecommender/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/bcrecommender/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="BCRecommender"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/bcrecommender/"><meta name=twitter:card content="summary"><meta name=twitter:title content="BCRecommender"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>BCRecommender</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>State of Bandcamp Recommender, Late 2017</h2></header><div class=entry-content><p>Call for BCRecommender maintainers followed by a decision to shut it down, as I don’t have enough time and Bandcamp now offers recommendations.</p></div><footer class=entry-footer><span title='2017-09-02 10:19:02 +0000 UTC'>September 2, 2017</span></footer><a class=entry-link aria-label="post link to State of Bandcamp Recommender, Late 2017" href=https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Migrating a simple web application from MongoDB to Elasticsearch</h2></header><div class=entry-content><p>Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.</p></div><footer class=entry-footer><span title='2015-11-04 03:53:18 +0000 UTC'>November 4, 2015</span></footer><a class=entry-link aria-label="post link to Migrating a simple web application from MongoDB to Elasticsearch" href=https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Goodbye, Parse.com</h2></header><div class=entry-content><p>Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution.</p></div><footer class=entry-footer><span title='2015-07-31 03:29:50 +0000 UTC'>July 31, 2015</span></footer><a class=entry-link aria-label="post link to Goodbye, Parse.com" href=https://yanirseroussi.com/2015/07/31/goodbye-parse-com/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>SEO: Mostly about showing up?</h2></header><div class=entry-content><p>Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content.</p></div><footer class=entry-footer><span title='2014-12-15 04:25:25 +0000 UTC'>December 15, 2014</span></footer><a class=entry-link aria-label="post link to SEO: Mostly about showing up?" href=https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>BCRecommender Traction Update</h2></header><div class=entry-content><p>Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing.</p></div><footer class=entry-footer><span title='2014-11-05 02:29:35 +0000 UTC'>November 5, 2014</span></footer><a class=entry-link aria-label="post link to BCRecommender Traction Update" href=https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Applying the Traction Book’s Bullseye framework to BCRecommender</h2></header><div class=entry-content><p>Ranking 19 channels with the goal of getting traction for BCRecommender.</p></div><footer class=entry-footer><span title='2014-09-24 04:57:39 +0000 UTC'>September 24, 2014</span></footer><a class=entry-link aria-label="post link to Applying the Traction Book’s Bullseye framework to BCRecommender" href=https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Bandcamp recommendation and discovery algorithms</h2></header><div class=entry-content><p>The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.</p></div><footer class=entry-footer><span title='2014-09-19 14:26:55 +0000 UTC'>September 19, 2014</span></footer><a class=entry-link aria-label="post link to Bandcamp recommendation and discovery algorithms" href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)</h2></header><div class=entry-content><p>Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service.</p></div><footer class=entry-footer><span title='2014-09-07 10:48:44 +0000 UTC'>September 7, 2014</span></footer><a class=entry-link aria-label="post link to Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)" href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building a Bandcamp recommender system (part 1 – motivation)</h2></header><div class=entry-content><p>My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music.</p></div><footer class=entry-footer><span title='2014-08-30 08:11:38 +0000 UTC'>August 30, 2014</span></footer><a class=entry-link aria-label="post link to Building a Bandcamp recommender system (part 1 – motivation)" href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/blogging/index.html b/tags/blogging/index.html
index a0cb08ba5..cd4b3ba38 100644
--- a/tags/blogging/index.html
+++ b/tags/blogging/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Blogging | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/blogging/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/blogging/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/blogging/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Blogging"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/blogging/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Blogging"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/blogging/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/blogging/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/blogging/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Blogging"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/blogging/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Blogging"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Blogging</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>New decade, new tagline: Data & AI for Impact</h2></header><div class=entry-content><p>Shifting focus to ‘Data & AI for Impact’, with more startup-related content, increased posting frequency, and deeper audience engagement.</p></div><footer class=entry-footer><span title='2024-01-19 00:00:00 +0000 UTC'>January 19, 2024</span></footer><a class=entry-link aria-label="post link to New decade, new tagline: Data & AI for Impact" href=https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My rediscovery of quiet writing on the open web</h2></header><div class=entry-content><p>Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes.</p></div><footer class=entry-footer><span title='2023-08-28 05:30:00 +0000 UTC'>August 28, 2023</span></footer><a class=entry-link aria-label="post link to My rediscovery of quiet writing on the open web" href=https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Making a TIL section with Hugo and PaperMod</h2></header><div class=entry-content><p>How I added a Today I Learned section to my Hugo site with the PaperMod theme.</p></div><footer class=entry-footer><span title='2023-07-17 00:06:15 +0000 UTC'>July 17, 2023</span></footer><a class=entry-link aria-label="post link to Making a TIL section with Hugo and PaperMod" href=https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/books/index.html b/tags/books/index.html
index d73b303a7..1ea14842e 100644
--- a/tags/books/index.html
+++ b/tags/books/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Books | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/books/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/books/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/books/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Books"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/books/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Books"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/books/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/books/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/books/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Books"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/books/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Books"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Books</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The rules of the passion economy</h2></header><div class=entry-content><p>Summary of the main messages from the book The Passion Economy by Adam Davidson.</p></div><footer class=entry-footer><span title='2024-06-12 02:50:00 +0000 UTC'>June 12, 2024</span></footer><a class=entry-link aria-label="post link to The rules of the passion economy" href=https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Adapting to the economy of algorithms</h2></header><div class=entry-content><p>Overview of the book The Economy of Algorithms by Marek Kowalkiewicz.</p></div><footer class=entry-footer><span title='2024-05-25 00:00:00 +0000 UTC'>May 25, 2024</span></footer><a class=entry-link aria-label="post link to Adapting to the economy of algorithms" href=https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The data engineering lifecycle is not going anywhere</h2></header><div class=entry-content><p>My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley.</p></div><footer class=entry-footer><span title='2024-04-05 01:00:00 +0000 UTC'>April 5, 2024</span></footer><a class=entry-link aria-label="post link to The data engineering lifecycle is not going anywhere" href=https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Atomic Habits is full of actionable advice</h2></header><div class=entry-content><p>I put the book to use after the first listen, and will definitely revisit it in the future to form better habits.</p></div><footer class=entry-footer><span title='2024-03-12 06:19:31 +0000 UTC'>March 12, 2024</span></footer><a class=entry-link aria-label="post link to Atomic Habits is full of actionable advice" href=https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Our Blue Machine is changing, but we are not helpless</h2></header><div class=entry-content><p>One of my many highlights from Helen Czerski’s Blue Machine.</p></div><footer class=entry-footer><span title='2023-11-28 06:40:00 +0000 UTC'>November 28, 2023</span></footer><a class=entry-link aria-label="post link to Our Blue Machine is changing, but we are not helpless" href=https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The Minimalist Entrepreneur is too prescriptive for me</h2></header><div class=entry-content><p>While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder’s experience.</p></div><footer class=entry-footer><span title='2023-08-21 03:15:00 +0000 UTC'>August 21, 2023</span></footer><a class=entry-link aria-label="post link to The Minimalist Entrepreneur is too prescriptive for me" href=https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Revisiting Start Small, Stay Small in 2023 (Chapter 2)</h2></header><div class=entry-content><p>A summary of the second chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.</p></div><footer class=entry-footer><span title='2023-08-17 07:45:00 +0000 UTC'>August 17, 2023</span></footer><a class=entry-link aria-label="post link to Revisiting Start Small, Stay Small in 2023 (Chapter 2)" href=https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Revisiting Start Small, Stay Small in 2023 (Chapter 1)</h2></header><div class=entry-content><p>A summary of the first chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.</p></div><footer class=entry-footer><span title='2023-08-16 05:45:00 +0000 UTC'>August 16, 2023</span></footer><a class=entry-link aria-label="post link to Revisiting Start Small, Stay Small in 2023 (Chapter 1)" href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>You can't save time</h2></header><div class=entry-content><p>Time can be spent doing different activities, but it can’t be stored and saved for later.</p></div><footer class=entry-footer><span title='2023-07-11 00:00:00 +0000 UTC'>July 11, 2023</span></footer><a class=entry-link aria-label="post link to You can't save time" href=https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/bootstrapping/index.html b/tags/bootstrapping/index.html
index c5bd6b78f..c94c754ec 100644
--- a/tags/bootstrapping/index.html
+++ b/tags/bootstrapping/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Bootstrapping | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/bootstrapping/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/bootstrapping/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/bootstrapping/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Bootstrapping"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/bootstrapping/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Bootstrapping"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/bootstrapping/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/bootstrapping/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/bootstrapping/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Bootstrapping"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/bootstrapping/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Bootstrapping"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Bootstrapping</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Many is not enough: Counting simulations to bootstrap the right way</h2></header><div class=entry-content><p>Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals.</p></div><footer class=entry-footer><span title='2020-08-24 01:35:17 +0000 UTC'>August 24, 2020</span></footer><a class=entry-link aria-label="post link to Many is not enough: Counting simulations to bootstrap the right way" href=https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Hackers beware: Bootstrap sampling may be harmful</h2></header><div class=entry-content><p>Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren’t that simple.</p></div><footer class=entry-footer><span title='2019-01-07 21:07:56 +0000 UTC'>January 7, 2019</span></footer><a class=entry-link aria-label="post link to Hackers beware: Bootstrap sampling may be harmful" href=https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/business/index.html b/tags/business/index.html
index 15c29f481..1f9f19ac3 100644
--- a/tags/business/index.html
+++ b/tags/business/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Business | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/business/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/business/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/business/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Business"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/business/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Business"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/business/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/business/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/business/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Business"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/business/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Business"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Business</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Five team-building mistakes, according to Patty McCord</h2></header><div class=entry-content><p>Takeaways from an interview with Patty McCord on The Startup Podcast.</p></div><footer class=entry-footer><span title='2024-06-26 00:00:00 +0000 UTC'>June 26, 2024</span></footer><a class=entry-link aria-label="post link to Five team-building mistakes, according to Patty McCord" href=https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The rules of the passion economy</h2></header><div class=entry-content><p>Summary of the main messages from the book The Passion Economy by Adam Davidson.</p></div><footer class=entry-footer><span title='2024-06-12 02:50:00 +0000 UTC'>June 12, 2024</span></footer><a class=entry-link aria-label="post link to The rules of the passion economy" href=https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Startup data health starts with healthy event tracking</h2></header><div class=entry-content><p>Expanding on the startup health check question of tracking Kukuyeva’s five business aspects as wide events.</p></div><footer class=entry-footer><span title='2024-06-10 04:00:00 +0000 UTC'>June 10, 2024</span></footer><a class=entry-link aria-label="post link to Startup data health starts with healthy event tracking" href=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>How to avoid startups with poor development processes</h2></header><div class=entry-content><p>Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.</p></div><footer class=entry-footer><span title='2024-06-03 02:45:00 +0000 UTC'>June 3, 2024</span></footer><a class=entry-link aria-label="post link to How to avoid startups with poor development processes" href=https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Plumbing, Decisions, and Automation: De-hyping Data & AI</h2></header><div class=entry-content><p>Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype).</p></div><footer class=entry-footer><span title='2024-05-27 02:00:00 +0000 UTC'>May 27, 2024</span></footer><a class=entry-link aria-label="post link to Plumbing, Decisions, and Automation: De-hyping Data & AI" href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Adapting to the economy of algorithms</h2></header><div class=entry-content><p>Overview of the book The Economy of Algorithms by Marek Kowalkiewicz.</p></div><footer class=entry-footer><span title='2024-05-25 00:00:00 +0000 UTC'>May 25, 2024</span></footer><a class=entry-link aria-label="post link to Adapting to the economy of algorithms" href=https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Question startup culture before accepting a data-to-AI role</h2></header><div class=entry-content><p>Eight questions that prospective data-to-AI employees should ask about a startup’s work and data culture.</p></div><footer class=entry-footer><span title='2024-05-20 02:25:00 +0000 UTC'>May 20, 2024</span></footer><a class=entry-link aria-label="post link to Question startup culture before accepting a data-to-AI role" href=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Probing the People aspects of an early-stage startup</h2></header><div class=entry-content><p>Ten questions that prospective employees should ask about a startup’s team, especially for data-centric roles.</p></div><footer class=entry-footer><span title='2024-05-13 02:00:00 +0000 UTC'>May 13, 2024</span></footer><a class=entry-link aria-label="post link to Probing the People aspects of an early-stage startup" href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Business questions to ask before taking a startup data role</h2></header><div class=entry-content><p>Fourteen questions that prospective employees should ask about a startup’s business model and product, especially for data-focused roles.</p></div><footer class=entry-footer><span title='2024-05-06 04:30:00 +0000 UTC'>May 6, 2024</span></footer><a class=entry-link aria-label="post link to Business questions to ask before taking a startup data role" href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Mentorship and the art of actionable advice</h2></header><div class=entry-content><p>Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships.</p></div><footer class=entry-footer><span title='2024-04-29 06:30:00 +0000 UTC'>April 29, 2024</span></footer><a class=entry-link aria-label="post link to Mentorship and the art of actionable advice" href=https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Assessing a startup's data-to-AI health</h2></header><div class=entry-content><p>Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.</p></div><footer class=entry-footer><span title='2024-04-22 06:00:00 +0000 UTC'>April 22, 2024</span></footer><a class=entry-link aria-label="post link to Assessing a startup's data-to-AI health" href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>LinkedIn is a teachable skill</h2></header><div class=entry-content><p>An high-level overview of things I learned from Justin Welsh’s LinkedIn Operating System course.</p></div><footer class=entry-footer><span title='2024-04-11 01:45:25 +0000 UTC'>April 11, 2024</span></footer><a class=entry-link aria-label="post link to LinkedIn is a teachable skill" href=https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The three Cs of indie consulting: Confidence, Cash, and Connections</h2></header><div class=entry-content><p>Jonathan Stark makes a compelling argument why you should have the three Cs before quitting your job to go solo consulting.</p></div><footer class=entry-footer><span title='2024-02-17 02:00:00 +0000 UTC'>February 17, 2024</span></footer><a class=entry-link aria-label="post link to The three Cs of indie consulting: Confidence, Cash, and Connections" href=https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Substance over titles: Your first data hire may be a data scientist</h2></header><div class=entry-content><p>Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people.</p></div><footer class=entry-footer><span title='2024-02-05 02:45:00 +0000 UTC'>February 5, 2024</span></footer><a class=entry-link aria-label="post link to Substance over titles: Your first data hire may be a data scientist" href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Psychographic specialisations may work for discipline generalists</h2></header><div class=entry-content><p>When focusing on a market segment defined by personal beliefs, it’s often fine to position yourself as a generalist in your craft.</p></div><footer class=entry-footer><span title='2024-01-09 03:00:00 +0000 UTC'>January 9, 2024</span></footer><a class=entry-link aria-label="post link to Psychographic specialisations may work for discipline generalists" href=https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The power of parasocial relationships</h2></header><div class=entry-content><p>Repeated exposure to media personas creates relationships that help justify premium fees.</p></div><footer class=entry-footer><span title='2024-01-08 06:00:00 +0000 UTC'>January 8, 2024</span></footer><a class=entry-link aria-label="post link to The power of parasocial relationships" href=https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Positioning is a common problem for data scientists</h2></header><div class=entry-content><p>With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark.</p></div><footer class=entry-footer><span title='2023-12-18 00:30:00 +0000 UTC'>December 18, 2023</span></footer><a class=entry-link aria-label="post link to Positioning is a common problem for data scientists" href=https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The lines between solo consulting and product building are blurry</h2></header><div class=entry-content><p>It turns out that problems like finding a niche and defining the ideal clients are key to any solo business.</p></div><footer class=entry-footer><span title='2023-09-25 00:00:00 +0000 UTC'>September 25, 2023</span></footer><a class=entry-link aria-label="post link to The lines between solo consulting and product building are blurry" href=https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The Minimalist Entrepreneur is too prescriptive for me</h2></header><div class=entry-content><p>While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder’s experience.</p></div><footer class=entry-footer><span title='2023-08-21 03:15:00 +0000 UTC'>August 21, 2023</span></footer><a class=entry-link aria-label="post link to The Minimalist Entrepreneur is too prescriptive for me" href=https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Revisiting Start Small, Stay Small in 2023 (Chapter 2)</h2></header><div class=entry-content><p>A summary of the second chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.</p></div><footer class=entry-footer><span title='2023-08-17 07:45:00 +0000 UTC'>August 17, 2023</span></footer><a class=entry-link aria-label="post link to Revisiting Start Small, Stay Small in 2023 (Chapter 2)" href=https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Revisiting Start Small, Stay Small in 2023 (Chapter 1)</h2></header><div class=entry-content><p>A summary of the first chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.</p></div><footer class=entry-footer><span title='2023-08-16 05:45:00 +0000 UTC'>August 16, 2023</span></footer><a class=entry-link aria-label="post link to Revisiting Start Small, Stay Small in 2023 (Chapter 1)" href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Software commodities are eating interesting data science work</h2></header><div class=entry-content><p>Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?</p></div><footer class=entry-footer><span title='2020-01-11 09:22:35 +0000 UTC'>January 11, 2020</span></footer><a class=entry-link aria-label="post link to Software commodities are eating interesting data science work" href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Defining data science in 2018</h2></header><div class=entry-content><p>Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.</p></div><footer class=entry-footer><span title='2018-07-22 08:27:43 +0000 UTC'>July 22, 2018</span></footer><a class=entry-link aria-label="post link to Defining data science in 2018" href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Customer lifetime value and the proliferation of misinformation on the internet</h2></header><div class=entry-content><p>There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.</p></div><footer class=entry-footer><span title='2017-01-08 20:02:30 +0000 UTC'>January 8, 2017</span></footer><a class=entry-link aria-label="post link to Customer lifetime value and the proliferation of misinformation on the internet" href=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>If you don’t pay attention, data can drive you off a cliff</h2></header><div class=entry-content><p>Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.</p></div><footer class=entry-footer><span title='2016-08-21 21:34:17 +0000 UTC'>August 21, 2016</span></footer><a class=entry-link aria-label="post link to If you don’t pay attention, data can drive you off a cliff" href=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is Data Scientist a useless job title?</h2></header><div class=entry-content><p>It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.</p></div><footer class=entry-footer><span title='2016-08-04 22:26:03 +0000 UTC'>August 4, 2016</span></footer><a class=entry-link aria-label="post link to Is Data Scientist a useless job title?" href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>You don’t need a data scientist (yet)</h2></header><div class=entry-content><p>Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist.</p></div><footer class=entry-footer><span title='2015-08-24 08:25:30 +0000 UTC'>August 24, 2015</span></footer><a class=entry-link aria-label="post link to You don’t need a data scientist (yet)" href=https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The long road to a lifestyle business</h2></header><div class=entry-content><p>Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects.</p></div><footer class=entry-footer><span title='2015-03-22 09:43:47 +0000 UTC'>March 22, 2015</span></footer><a class=entry-link aria-label="post link to The long road to a lifestyle business" href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>BCRecommender Traction Update</h2></header><div class=entry-content><p>Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing.</p></div><footer class=entry-footer><span title='2014-11-05 02:29:35 +0000 UTC'>November 5, 2014</span></footer><a class=entry-link aria-label="post link to BCRecommender Traction Update" href=https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Applying the Traction Book’s Bullseye framework to BCRecommender</h2></header><div class=entry-content><p>Ranking 19 channels with the goal of getting traction for BCRecommender.</p></div><footer class=entry-footer><span title='2014-09-24 04:57:39 +0000 UTC'>September 24, 2014</span></footer><a class=entry-link aria-label="post link to Applying the Traction Book’s Bullseye framework to BCRecommender" href=https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Data’s hierarchy of needs</h2></header><div class=entry-content><p>Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data.</p></div><footer class=entry-footer><span title='2014-08-17 13:09:30 +0000 UTC'>August 17, 2014</span></footer><a class=entry-link aria-label="post link to Data’s hierarchy of needs" href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/career/index.html b/tags/career/index.html
index 318d06599..46645c4ba 100644
--- a/tags/career/index.html
+++ b/tags/career/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Career | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/career/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/career/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/career/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Career"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/career/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Career"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/career/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/career/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/career/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Career"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/career/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Career"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Career</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Five team-building mistakes, according to Patty McCord</h2></header><div class=entry-content><p>Takeaways from an interview with Patty McCord on The Startup Podcast.</p></div><footer class=entry-footer><span title='2024-06-26 00:00:00 +0000 UTC'>June 26, 2024</span></footer><a class=entry-link aria-label="post link to Five team-building mistakes, according to Patty McCord" href=https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The rules of the passion economy</h2></header><div class=entry-content><p>Summary of the main messages from the book The Passion Economy by Adam Davidson.</p></div><footer class=entry-footer><span title='2024-06-12 02:50:00 +0000 UTC'>June 12, 2024</span></footer><a class=entry-link aria-label="post link to The rules of the passion economy" href=https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>How to avoid startups with poor development processes</h2></header><div class=entry-content><p>Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.</p></div><footer class=entry-footer><span title='2024-06-03 02:45:00 +0000 UTC'>June 3, 2024</span></footer><a class=entry-link aria-label="post link to How to avoid startups with poor development processes" href=https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Plumbing, Decisions, and Automation: De-hyping Data & AI</h2></header><div class=entry-content><p>Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype).</p></div><footer class=entry-footer><span title='2024-05-27 02:00:00 +0000 UTC'>May 27, 2024</span></footer><a class=entry-link aria-label="post link to Plumbing, Decisions, and Automation: De-hyping Data & AI" href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Adapting to the economy of algorithms</h2></header><div class=entry-content><p>Overview of the book The Economy of Algorithms by Marek Kowalkiewicz.</p></div><footer class=entry-footer><span title='2024-05-25 00:00:00 +0000 UTC'>May 25, 2024</span></footer><a class=entry-link aria-label="post link to Adapting to the economy of algorithms" href=https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Question startup culture before accepting a data-to-AI role</h2></header><div class=entry-content><p>Eight questions that prospective data-to-AI employees should ask about a startup’s work and data culture.</p></div><footer class=entry-footer><span title='2024-05-20 02:25:00 +0000 UTC'>May 20, 2024</span></footer><a class=entry-link aria-label="post link to Question startup culture before accepting a data-to-AI role" href=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Probing the People aspects of an early-stage startup</h2></header><div class=entry-content><p>Ten questions that prospective employees should ask about a startup’s team, especially for data-centric roles.</p></div><footer class=entry-footer><span title='2024-05-13 02:00:00 +0000 UTC'>May 13, 2024</span></footer><a class=entry-link aria-label="post link to Probing the People aspects of an early-stage startup" href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Business questions to ask before taking a startup data role</h2></header><div class=entry-content><p>Fourteen questions that prospective employees should ask about a startup’s business model and product, especially for data-focused roles.</p></div><footer class=entry-footer><span title='2024-05-06 04:30:00 +0000 UTC'>May 6, 2024</span></footer><a class=entry-link aria-label="post link to Business questions to ask before taking a startup data role" href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Mentorship and the art of actionable advice</h2></header><div class=entry-content><p>Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships.</p></div><footer class=entry-footer><span title='2024-04-29 06:30:00 +0000 UTC'>April 29, 2024</span></footer><a class=entry-link aria-label="post link to Mentorship and the art of actionable advice" href=https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>LinkedIn is a teachable skill</h2></header><div class=entry-content><p>An high-level overview of things I learned from Justin Welsh’s LinkedIn Operating System course.</p></div><footer class=entry-footer><span title='2024-04-11 01:45:25 +0000 UTC'>April 11, 2024</span></footer><a class=entry-link aria-label="post link to LinkedIn is a teachable skill" href=https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My experience as a Data Tech Lead with Work on Climate</h2></header><div class=entry-content><p>The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.</p></div><footer class=entry-footer><span title='2024-04-08 02:00:00 +0000 UTC'>April 8, 2024</span></footer><a class=entry-link aria-label="post link to My experience as a Data Tech Lead with Work on Climate" href=https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The data engineering lifecycle is not going anywhere</h2></header><div class=entry-content><p>My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley.</p></div><footer class=entry-footer><span title='2024-04-05 01:00:00 +0000 UTC'>April 5, 2024</span></footer><a class=entry-link aria-label="post link to The data engineering lifecycle is not going anywhere" href=https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Atomic Habits is full of actionable advice</h2></header><div class=entry-content><p>I put the book to use after the first listen, and will definitely revisit it in the future to form better habits.</p></div><footer class=entry-footer><span title='2024-03-12 06:19:31 +0000 UTC'>March 12, 2024</span></footer><a class=entry-link aria-label="post link to Atomic Habits is full of actionable advice" href=https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The three Cs of indie consulting: Confidence, Cash, and Connections</h2></header><div class=entry-content><p>Jonathan Stark makes a compelling argument why you should have the three Cs before quitting your job to go solo consulting.</p></div><footer class=entry-footer><span title='2024-02-17 02:00:00 +0000 UTC'>February 17, 2024</span></footer><a class=entry-link aria-label="post link to The three Cs of indie consulting: Confidence, Cash, and Connections" href=https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Substance over titles: Your first data hire may be a data scientist</h2></header><div class=entry-content><p>Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people.</p></div><footer class=entry-footer><span title='2024-02-05 02:45:00 +0000 UTC'>February 5, 2024</span></footer><a class=entry-link aria-label="post link to Substance over titles: Your first data hire may be a data scientist" href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Psychographic specialisations may work for discipline generalists</h2></header><div class=entry-content><p>When focusing on a market segment defined by personal beliefs, it’s often fine to position yourself as a generalist in your craft.</p></div><footer class=entry-footer><span title='2024-01-09 03:00:00 +0000 UTC'>January 9, 2024</span></footer><a class=entry-link aria-label="post link to Psychographic specialisations may work for discipline generalists" href=https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The power of parasocial relationships</h2></header><div class=entry-content><p>Repeated exposure to media personas creates relationships that help justify premium fees.</p></div><footer class=entry-footer><span title='2024-01-08 06:00:00 +0000 UTC'>January 8, 2024</span></footer><a class=entry-link aria-label="post link to The power of parasocial relationships" href=https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Positioning is a common problem for data scientists</h2></header><div class=entry-content><p>With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark.</p></div><footer class=entry-footer><span title='2023-12-18 00:30:00 +0000 UTC'>December 18, 2023</span></footer><a class=entry-link aria-label="post link to Positioning is a common problem for data scientists" href=https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Lessons from reluctant data engineering</h2></header><div class=entry-content><p>Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had.</p></div><footer class=entry-footer><span title='2023-10-25 04:45:00 +0000 UTC'>October 25, 2023</span></footer><a class=entry-link aria-label="post link to Lessons from reluctant data engineering" href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The lines between solo consulting and product building are blurry</h2></header><div class=entry-content><p>It turns out that problems like finding a niche and defining the ideal clients are key to any solo business.</p></div><footer class=entry-footer><span title='2023-09-25 00:00:00 +0000 UTC'>September 25, 2023</span></footer><a class=entry-link aria-label="post link to The lines between solo consulting and product building are blurry" href=https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The Minimalist Entrepreneur is too prescriptive for me</h2></header><div class=entry-content><p>While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder’s experience.</p></div><footer class=entry-footer><span title='2023-08-21 03:15:00 +0000 UTC'>August 21, 2023</span></footer><a class=entry-link aria-label="post link to The Minimalist Entrepreneur is too prescriptive for me" href=https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Revisiting Start Small, Stay Small in 2023 (Chapter 2)</h2></header><div class=entry-content><p>A summary of the second chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.</p></div><footer class=entry-footer><span title='2023-08-17 07:45:00 +0000 UTC'>August 17, 2023</span></footer><a class=entry-link aria-label="post link to Revisiting Start Small, Stay Small in 2023 (Chapter 2)" href=https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Revisiting Start Small, Stay Small in 2023 (Chapter 1)</h2></header><div class=entry-content><p>A summary of the first chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.</p></div><footer class=entry-footer><span title='2023-08-16 05:45:00 +0000 UTC'>August 16, 2023</span></footer><a class=entry-link aria-label="post link to Revisiting Start Small, Stay Small in 2023 (Chapter 1)" href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Was data science a failure mode of software engineering?</h2></header><div class=entry-content><p>Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.</p></div><footer class=entry-footer><span title='2023-06-30 00:06:30 +0000 UTC'>June 30, 2023</span></footer><a class=entry-link aria-label="post link to Was data science a failure mode of software engineering?" href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>How hackable are automated coding assessments?</h2></header><div class=entry-content><p>Exploring the hackability of speed-based coding tests, using CodeSignal’s Industry Coding Framework as a case study.</p></div><footer class=entry-footer><span title='2023-05-26 00:03:00 +0000 UTC'>May 26, 2023</span></footer><a class=entry-link aria-label="post link to How hackable are automated coding assessments?" href=https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Remaining relevant as a small language model</h2></header><div class=entry-content><p>Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now).</p></div><footer class=entry-footer><span title='2023-04-21 00:06:30 +0000 UTC'>April 21, 2023</span></footer><a class=entry-link aria-label="post link to Remaining relevant as a small language model" href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The mission matters: Moving to climate tech as a data scientist</h2></header><div class=entry-content><p>Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.</p></div><footer class=entry-footer><span title='2022-06-06 00:00:00 +0000 UTC'>June 6, 2022</span></footer><a class=entry-link aria-label="post link to The mission matters: Moving to climate tech as a data scientist" href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My work with Automattic</h2></header><div class=entry-content><p>Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.</p></div><footer class=entry-footer><span title='2021-10-07 00:00:00 +0000 UTC'>October 7, 2021</span></footer><a class=entry-link aria-label="post link to My work with Automattic" href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Some highlights from 2020</h2></header><div class=entry-content><p>Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform.</p></div><footer class=entry-footer><span title='2021-04-05 06:41:48 +0000 UTC'>April 5, 2021</span></footer><a class=entry-link aria-label="post link to Some highlights from 2020" href=https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Software commodities are eating interesting data science work</h2></header><div class=entry-content><p>Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?</p></div><footer class=entry-footer><span title='2020-01-11 09:22:35 +0000 UTC'>January 11, 2020</span></footer><a class=entry-link aria-label="post link to Software commodities are eating interesting data science work" href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>A day in the life of a remote data scientist</h2></header><div class=entry-content><p>Video of a talk I gave on remote data science work at the Data Science Sydney meetup.</p></div><footer class=entry-footer><span title='2019-12-11 22:06:19 +0000 UTC'>December 11, 2019</span></footer><a class=entry-link aria-label="post link to A day in the life of a remote data scientist" href=https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Reflections on remote data science work</h2></header><div class=entry-content><p>Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist.</p></div><footer class=entry-footer><span title='2018-11-03 06:33:13 +0000 UTC'>November 3, 2018</span></footer><a class=entry-link aria-label="post link to Reflections on remote data science work" href=https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Advice for aspiring data scientists and other FAQs</h2></header><div class=entry-content><p>Frequently asked questions by visitors to this site, especially around entering the data science field.</p></div><footer class=entry-footer><span title='2017-10-15 09:15:25 +0000 UTC'>October 15, 2017</span></footer><a class=entry-link aria-label="post link to Advice for aspiring data scientists and other FAQs" href=https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My 10-step path to becoming a remote data scientist with Automattic</h2></header><div class=entry-content><p>I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.</p></div><footer class=entry-footer><span title='2017-07-29 05:39:26 +0000 UTC'>July 29, 2017</span></footer><a class=entry-link aria-label="post link to My 10-step path to becoming a remote data scientist with Automattic" href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My PhD work</h2></header><div class=entry-content><p>An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.</p></div><footer class=entry-footer><span title='2015-03-30 03:23:33 +0000 UTC'>March 30, 2015</span></footer><a class=entry-link aria-label="post link to My PhD work" href=https://yanirseroussi.com/phd-work/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/causal-inference/index.html b/tags/causal-inference/index.html
index 493ee2d66..dde4b4c46 100644
--- a/tags/causal-inference/index.html
+++ b/tags/causal-inference/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Causal Inference | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/causal-inference/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/causal-inference/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/causal-inference/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Causal Inference"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/causal-inference/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Causal Inference"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/causal-inference/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/causal-inference/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/causal-inference/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Causal Inference"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/causal-inference/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Causal Inference"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Causal Inference</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Causal Machine Learning is off to a good start, despite some issues</h2></header><div class=entry-content><p>Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.</p></div><footer class=entry-footer><span title='2022-09-12 02:45:00 +0000 UTC'>September 12, 2022</span></footer><a class=entry-link aria-label="post link to Causal Machine Learning is off to a good start, despite some issues" href=https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials</h2></header><div class=entry-content><p>Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.</p></div><footer class=entry-footer><span title='2022-01-14 00:05:40 +0000 UTC'>January 14, 2022</span></footer><a class=entry-link aria-label="post link to Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials" href=https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My work with Automattic</h2></header><div class=entry-content><p>Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.</p></div><footer class=entry-footer><span title='2021-10-07 00:00:00 +0000 UTC'>October 7, 2021</span></footer><a class=entry-link aria-label="post link to My work with Automattic" href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Some highlights from 2020</h2></header><div class=entry-content><p>Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform.</p></div><footer class=entry-footer><span title='2021-04-05 06:41:48 +0000 UTC'>April 5, 2021</span></footer><a class=entry-link aria-label="post link to Some highlights from 2020" href=https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The most practical causal inference book I’ve read (is still a draft)</h2></header><div class=entry-content><p>Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area.</p></div><footer class=entry-footer><span title='2018-12-24 02:37:50 +0000 UTC'>December 24, 2018</span></footer><a class=entry-link aria-label="post link to The most practical causal inference book I’ve read (is still a draft)" href=https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Ask Why! Finding motives, causes, and purpose in data science</h2></header><div class=entry-content><p>Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling.</p></div><footer class=entry-footer><span title='2016-09-19 21:28:44 +0000 UTC'>September 19, 2016</span></footer><a class=entry-link aria-label="post link to Ask Why! Finding motives, causes, and purpose in data science" href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Making Bayesian A/B testing more accessible</h2></header><div class=entry-content><p>A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.</p></div><footer class=entry-footer><span title='2016-06-19 10:32:15 +0000 UTC'>June 19, 2016</span></footer><a class=entry-link aria-label="post link to Making Bayesian A/B testing more accessible" href=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions</h2></header><div class=entry-content><p>Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time.</p></div><footer class=entry-footer><span title='2016-05-14 19:57:03 +0000 UTC'>May 14, 2016</span></footer><a class=entry-link aria-label="post link to Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions" href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Why you should stop worrying about deep learning and deepen your understanding of causality instead</h2></header><div class=entry-content><p>Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.</p></div><footer class=entry-footer><span title='2016-02-14 11:04:11 +0000 UTC'>February 14, 2016</span></footer><a class=entry-link aria-label="post link to Why you should stop worrying about deep learning and deepen your understanding of causality instead" href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/climate-change/index.html b/tags/climate-change/index.html
index bc23cd9ec..1ab4343d6 100644
--- a/tags/climate-change/index.html
+++ b/tags/climate-change/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Climate Change | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/climate-change/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/climate-change/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/climate-change/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Climate Change"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/climate-change/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Climate Change"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/climate-change/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/climate-change/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/climate-change/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Climate Change"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/climate-change/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Climate Change"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Climate Change</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My experience as a Data Tech Lead with Work on Climate</h2></header><div class=entry-content><p>The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.</p></div><footer class=entry-footer><span title='2024-04-08 02:00:00 +0000 UTC'>April 8, 2024</span></footer><a class=entry-link aria-label="post link to My experience as a Data Tech Lead with Work on Climate" href=https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The mission matters: Moving to climate tech as a data scientist</h2></header><div class=entry-content><p>Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.</p></div><footer class=entry-footer><span title='2022-06-06 00:00:00 +0000 UTC'>June 6, 2022</span></footer><a class=entry-link aria-label="post link to The mission matters: Moving to climate tech as a data scientist" href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The hardest parts of data science</h2></header><div class=entry-content><p>Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.</p></div><footer class=entry-footer><span title='2015-11-23 04:14:21 +0000 UTC'>November 23, 2015</span></footer><a class=entry-link aria-label="post link to The hardest parts of data science" href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My divestment from fossil fuels</h2></header><div class=entry-content><p>Recent choices I’ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons.</p></div><footer class=entry-footer><span title='2015-04-24 00:19:36 +0000 UTC'>April 24, 2015</span></footer><a class=entry-link aria-label="post link to My divestment from fossil fuels" href=https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/cloudflare/index.html b/tags/cloudflare/index.html
index 3094f5869..4e2f446cc 100644
--- a/tags/cloudflare/index.html
+++ b/tags/cloudflare/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Cloudflare | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/cloudflare/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/cloudflare/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/cloudflare/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Cloudflare"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/cloudflare/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Cloudflare"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/cloudflare/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/cloudflare/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/cloudflare/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Cloudflare"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/cloudflare/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Cloudflare"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Cloudflare</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Migrating from WordPress.com to Hugo on GitHub + Cloudflare</h2></header><div class=entry-content><p>My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.</p></div><footer class=entry-footer><span title='2021-11-10 06:30:00 +0000 UTC'>November 10, 2021</span></footer><a class=entry-link aria-label="post link to Migrating from WordPress.com to Hugo on GitHub + Cloudflare" href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/confidence-intervals/index.html b/tags/confidence-intervals/index.html
index 1642bce20..619204593 100644
--- a/tags/confidence-intervals/index.html
+++ b/tags/confidence-intervals/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Confidence Intervals | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/confidence-intervals/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/confidence-intervals/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/confidence-intervals/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Confidence Intervals"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/confidence-intervals/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Confidence Intervals"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/confidence-intervals/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/confidence-intervals/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/confidence-intervals/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Confidence Intervals"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/confidence-intervals/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Confidence Intervals"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Confidence Intervals</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Many is not enough: Counting simulations to bootstrap the right way</h2></header><div class=entry-content><p>Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals.</p></div><footer class=entry-footer><span title='2020-08-24 01:35:17 +0000 UTC'>August 24, 2020</span></footer><a class=entry-link aria-label="post link to Many is not enough: Counting simulations to bootstrap the right way" href=https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/consulting/index.html b/tags/consulting/index.html
index bf6714737..2de25dfe2 100644
--- a/tags/consulting/index.html
+++ b/tags/consulting/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Consulting | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/consulting/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/consulting/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/consulting/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Consulting"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/consulting/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Consulting"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/consulting/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/consulting/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/consulting/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Consulting"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/consulting/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Consulting"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Consulting</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Mentorship and the art of actionable advice</h2></header><div class=entry-content><p>Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships.</p></div><footer class=entry-footer><span title='2024-04-29 06:30:00 +0000 UTC'>April 29, 2024</span></footer><a class=entry-link aria-label="post link to Mentorship and the art of actionable advice" href=https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/data-business/index.html b/tags/data-business/index.html
index aa9ee8895..3cef08e66 100644
--- a/tags/data-business/index.html
+++ b/tags/data-business/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Data Business | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/data-business/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/data-business/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/data-business/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Data Business"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/data-business/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Data Business"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/data-business/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/data-business/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/data-business/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Data Business"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/data-business/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Data Business"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Data Business</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Positioning is a common problem for data scientists</h2></header><div class=entry-content><p>With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark.</p></div><footer class=entry-footer><span title='2023-12-18 00:30:00 +0000 UTC'>December 18, 2023</span></footer><a class=entry-link aria-label="post link to Positioning is a common problem for data scientists" href=https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Advice for aspiring data scientists and other FAQs</h2></header><div class=entry-content><p>Frequently asked questions by visitors to this site, especially around entering the data science field.</p></div><footer class=entry-footer><span title='2017-10-15 09:15:25 +0000 UTC'>October 15, 2017</span></footer><a class=entry-link aria-label="post link to Advice for aspiring data scientists and other FAQs" href=https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling</h2></header><div class=entry-content><p>Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.</p></div><footer class=entry-footer><span title='2015-10-19 00:02:32 +0000 UTC'>October 19, 2015</span></footer><a class=entry-link aria-label="post link to Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling" href=https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>You don’t need a data scientist (yet)</h2></header><div class=entry-content><p>Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist.</p></div><footer class=entry-footer><span title='2015-08-24 08:25:30 +0000 UTC'>August 24, 2015</span></footer><a class=entry-link aria-label="post link to You don’t need a data scientist (yet)" href=https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Data’s hierarchy of needs</h2></header><div class=entry-content><p>Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data.</p></div><footer class=entry-footer><span title='2014-08-17 13:09:30 +0000 UTC'>August 17, 2014</span></footer><a class=entry-link aria-label="post link to Data’s hierarchy of needs" href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/data-engineering/index.html b/tags/data-engineering/index.html
index f3a7896e1..99331db82 100644
--- a/tags/data-engineering/index.html
+++ b/tags/data-engineering/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Data Engineering | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/data-engineering/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/data-engineering/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/data-engineering/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Data Engineering"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/data-engineering/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Data Engineering"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/data-engineering/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/data-engineering/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/data-engineering/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Data Engineering"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/data-engineering/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Data Engineering"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Data Engineering</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Plumbing, Decisions, and Automation: De-hyping Data & AI</h2></header><div class=entry-content><p>Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype).</p></div><footer class=entry-footer><span title='2024-05-27 02:00:00 +0000 UTC'>May 27, 2024</span></footer><a class=entry-link aria-label="post link to Plumbing, Decisions, and Automation: De-hyping Data & AI" href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My experience as a Data Tech Lead with Work on Climate</h2></header><div class=entry-content><p>The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.</p></div><footer class=entry-footer><span title='2024-04-08 02:00:00 +0000 UTC'>April 8, 2024</span></footer><a class=entry-link aria-label="post link to My experience as a Data Tech Lead with Work on Climate" href=https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The data engineering lifecycle is not going anywhere</h2></header><div class=entry-content><p>My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley.</p></div><footer class=entry-footer><span title='2024-04-05 01:00:00 +0000 UTC'>April 5, 2024</span></footer><a class=entry-link aria-label="post link to The data engineering lifecycle is not going anywhere" href=https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building your startup's minimum viable data stack</h2></header><div class=entry-content><p>First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations.</p></div><footer class=entry-footer><span title='2024-02-19 00:00:00 +0000 UTC'>February 19, 2024</span></footer><a class=entry-link aria-label="post link to Building your startup's minimum viable data stack" href=https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Substance over titles: Your first data hire may be a data scientist</h2></header><div class=entry-content><p>Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people.</p></div><footer class=entry-footer><span title='2024-02-05 02:45:00 +0000 UTC'>February 5, 2024</span></footer><a class=entry-link aria-label="post link to Substance over titles: Your first data hire may be a data scientist" href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Supporting volunteer monitoring of marine biodiversity with modern web and data tools</h2></header><div class=entry-content><p>Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.</p></div><footer class=entry-footer><span title='2023-11-29 02:00:00 +0000 UTC'>November 29, 2023</span></footer><a class=entry-link aria-label="post link to Supporting volunteer monitoring of marine biodiversity with modern web and data tools" href=https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>You don't need a proprietary API for static maps</h2></header><div class=entry-content><p>For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps.</p></div><footer class=entry-footer><span title='2023-11-21 06:00:00 +0000 UTC'>November 21, 2023</span></footer><a class=entry-link aria-label="post link to You don't need a proprietary API for static maps" href=https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Lessons from reluctant data engineering</h2></header><div class=entry-content><p>Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had.</p></div><footer class=entry-footer><span title='2023-10-25 04:45:00 +0000 UTC'>October 25, 2023</span></footer><a class=entry-link aria-label="post link to Lessons from reluctant data engineering" href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/data-science/index.html b/tags/data-science/index.html
index 7f6726569..206fc72c3 100644
--- a/tags/data-science/index.html
+++ b/tags/data-science/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Data Science | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/data-science/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/data-science/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/data-science/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Data Science"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/data-science/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Data Science"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/data-science/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/data-science/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/data-science/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Data Science"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/data-science/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Data Science"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Data Science</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is your tech stack ready for data-intensive applications?</h2></header><div class=entry-content><p>Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.</p></div><footer class=entry-footer><span title='2024-06-24 02:00:00 +0000 UTC'>June 24, 2024</span></footer><a class=entry-link aria-label="post link to Is your tech stack ready for data-intensive applications?" href=https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>AI ain't gonna save you from bad data</h2></header><div class=entry-content><p>Since we’re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects.</p></div><footer class=entry-footer><span title='2024-06-17 02:00:00 +0000 UTC'>June 17, 2024</span></footer><a class=entry-link aria-label="post link to AI ain't gonna save you from bad data" href=https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Startup data health starts with healthy event tracking</h2></header><div class=entry-content><p>Expanding on the startup health check question of tracking Kukuyeva’s five business aspects as wide events.</p></div><footer class=entry-footer><span title='2024-06-10 04:00:00 +0000 UTC'>June 10, 2024</span></footer><a class=entry-link aria-label="post link to Startup data health starts with healthy event tracking" href=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Plumbing, Decisions, and Automation: De-hyping Data & AI</h2></header><div class=entry-content><p>Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype).</p></div><footer class=entry-footer><span title='2024-05-27 02:00:00 +0000 UTC'>May 27, 2024</span></footer><a class=entry-link aria-label="post link to Plumbing, Decisions, and Automation: De-hyping Data & AI" href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Assessing a startup's data-to-AI health</h2></header><div class=entry-content><p>Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.</p></div><footer class=entry-footer><span title='2024-04-22 06:00:00 +0000 UTC'>April 22, 2024</span></footer><a class=entry-link aria-label="post link to Assessing a startup's data-to-AI health" href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My experience as a Data Tech Lead with Work on Climate</h2></header><div class=entry-content><p>The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.</p></div><footer class=entry-footer><span title='2024-04-08 02:00:00 +0000 UTC'>April 8, 2024</span></footer><a class=entry-link aria-label="post link to My experience as a Data Tech Lead with Work on Climate" href=https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Questions to consider when using AI for PDF data extraction</h2></header><div class=entry-content><p>Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.</p></div><footer class=entry-footer><span title='2024-03-11 00:00:00 +0000 UTC'>March 11, 2024</span></footer><a class=entry-link aria-label="post link to Questions to consider when using AI for PDF data extraction" href=https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Substance over titles: Your first data hire may be a data scientist</h2></header><div class=entry-content><p>Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people.</p></div><footer class=entry-footer><span title='2024-02-05 02:45:00 +0000 UTC'>February 5, 2024</span></footer><a class=entry-link aria-label="post link to Substance over titles: Your first data hire may be a data scientist" href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>New decade, new tagline: Data & AI for Impact</h2></header><div class=entry-content><p>Shifting focus to ‘Data & AI for Impact’, with more startup-related content, increased posting frequency, and deeper audience engagement.</p></div><footer class=entry-footer><span title='2024-01-19 00:00:00 +0000 UTC'>January 19, 2024</span></footer><a class=entry-link aria-label="post link to New decade, new tagline: Data & AI for Impact" href=https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Positioning is a common problem for data scientists</h2></header><div class=entry-content><p>With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark.</p></div><footer class=entry-footer><span title='2023-12-18 00:30:00 +0000 UTC'>December 18, 2023</span></footer><a class=entry-link aria-label="post link to Positioning is a common problem for data scientists" href=https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>You don't need a proprietary API for static maps</h2></header><div class=entry-content><p>For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps.</p></div><footer class=entry-footer><span title='2023-11-21 06:00:00 +0000 UTC'>November 21, 2023</span></footer><a class=entry-link aria-label="post link to You don't need a proprietary API for static maps" href=https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Lessons from reluctant data engineering</h2></header><div class=entry-content><p>Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had.</p></div><footer class=entry-footer><span title='2023-10-25 04:45:00 +0000 UTC'>October 25, 2023</span></footer><a class=entry-link aria-label="post link to Lessons from reluctant data engineering" href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Google's Rules of Machine Learning still apply in the age of large language models</h2></header><div class=entry-content><p>Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices.</p></div><footer class=entry-footer><span title='2023-09-21 21:30:00 +0000 UTC'>September 21, 2023</span></footer><a class=entry-link aria-label="post link to Google's Rules of Machine Learning still apply in the age of large language models" href=https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Was data science a failure mode of software engineering?</h2></header><div class=entry-content><p>Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.</p></div><footer class=entry-footer><span title='2023-06-30 00:06:30 +0000 UTC'>June 30, 2023</span></footer><a class=entry-link aria-label="post link to Was data science a failure mode of software engineering?" href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Causal Machine Learning is off to a good start, despite some issues</h2></header><div class=entry-content><p>Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.</p></div><footer class=entry-footer><span title='2022-09-12 02:45:00 +0000 UTC'>September 12, 2022</span></footer><a class=entry-link aria-label="post link to Causal Machine Learning is off to a good start, despite some issues" href=https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The mission matters: Moving to climate tech as a data scientist</h2></header><div class=entry-content><p>Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.</p></div><footer class=entry-footer><span title='2022-06-06 00:00:00 +0000 UTC'>June 6, 2022</span></footer><a class=entry-link aria-label="post link to The mission matters: Moving to climate tech as a data scientist" href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building useful machine learning tools keeps getting easier: A fish ID case study</h2></header><div class=entry-content><p>Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.</p></div><footer class=entry-footer><span title='2022-03-20 04:30:00 +0000 UTC'>March 20, 2022</span></footer><a class=entry-link aria-label="post link to Building useful machine learning tools keeps getting easier: A fish ID case study" href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials</h2></header><div class=entry-content><p>Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.</p></div><footer class=entry-footer><span title='2022-01-14 00:05:40 +0000 UTC'>January 14, 2022</span></footer><a class=entry-link aria-label="post link to Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials" href=https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Use your human brain to avoid artificial intelligence disasters</h2></header><div class=entry-content><p>Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.</p></div><footer class=entry-footer><span title='2021-11-22 03:45:00 +0000 UTC'>November 22, 2021</span></footer><a class=entry-link aria-label="post link to Use your human brain to avoid artificial intelligence disasters" href=https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My work with Automattic</h2></header><div class=entry-content><p>Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.</p></div><footer class=entry-footer><span title='2021-10-07 00:00:00 +0000 UTC'>October 7, 2021</span></footer><a class=entry-link aria-label="post link to My work with Automattic" href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Many is not enough: Counting simulations to bootstrap the right way</h2></header><div class=entry-content><p>Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals.</p></div><footer class=entry-footer><span title='2020-08-24 01:35:17 +0000 UTC'>August 24, 2020</span></footer><a class=entry-link aria-label="post link to Many is not enough: Counting simulations to bootstrap the right way" href=https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Software commodities are eating interesting data science work</h2></header><div class=entry-content><p>Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?</p></div><footer class=entry-footer><span title='2020-01-11 09:22:35 +0000 UTC'>January 11, 2020</span></footer><a class=entry-link aria-label="post link to Software commodities are eating interesting data science work" href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>A day in the life of a remote data scientist</h2></header><div class=entry-content><p>Video of a talk I gave on remote data science work at the Data Science Sydney meetup.</p></div><footer class=entry-footer><span title='2019-12-11 22:06:19 +0000 UTC'>December 11, 2019</span></footer><a class=entry-link aria-label="post link to A day in the life of a remote data scientist" href=https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Bootstrapping the right way?</h2></header><div class=entry-content><p>Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.</p></div><footer class=entry-footer><span title='2019-10-06 06:48:07 +0000 UTC'>October 6, 2019</span></footer><a class=entry-link aria-label="post link to Bootstrapping the right way?" href=https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Hackers beware: Bootstrap sampling may be harmful</h2></header><div class=entry-content><p>Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren’t that simple.</p></div><footer class=entry-footer><span title='2019-01-07 21:07:56 +0000 UTC'>January 7, 2019</span></footer><a class=entry-link aria-label="post link to Hackers beware: Bootstrap sampling may be harmful" href=https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The most practical causal inference book I’ve read (is still a draft)</h2></header><div class=entry-content><p>Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area.</p></div><footer class=entry-footer><span title='2018-12-24 02:37:50 +0000 UTC'>December 24, 2018</span></footer><a class=entry-link aria-label="post link to The most practical causal inference book I’ve read (is still a draft)" href=https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Reflections on remote data science work</h2></header><div class=entry-content><p>Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist.</p></div><footer class=entry-footer><span title='2018-11-03 06:33:13 +0000 UTC'>November 3, 2018</span></footer><a class=entry-link aria-label="post link to Reflections on remote data science work" href=https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Defining data science in 2018</h2></header><div class=entry-content><p>Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.</p></div><footer class=entry-footer><span title='2018-07-22 08:27:43 +0000 UTC'>July 22, 2018</span></footer><a class=entry-link aria-label="post link to Defining data science in 2018" href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Advice for aspiring data scientists and other FAQs</h2></header><div class=entry-content><p>Frequently asked questions by visitors to this site, especially around entering the data science field.</p></div><footer class=entry-footer><span title='2017-10-15 09:15:25 +0000 UTC'>October 15, 2017</span></footer><a class=entry-link aria-label="post link to Advice for aspiring data scientists and other FAQs" href=https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My 10-step path to becoming a remote data scientist with Automattic</h2></header><div class=entry-content><p>I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.</p></div><footer class=entry-footer><span title='2017-07-29 05:39:26 +0000 UTC'>July 29, 2017</span></footer><a class=entry-link aria-label="post link to My 10-step path to becoming a remote data scientist with Automattic" href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Exploring and visualising Reef Life Survey data</h2></header><div class=entry-content><p>Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.</p></div><footer class=entry-footer><span title='2017-06-03 00:49:05 +0000 UTC'>June 3, 2017</span></footer><a class=entry-link aria-label="post link to Exploring and visualising Reef Life Survey data" href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Customer lifetime value and the proliferation of misinformation on the internet</h2></header><div class=entry-content><p>There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.</p></div><footer class=entry-footer><span title='2017-01-08 20:02:30 +0000 UTC'>January 8, 2017</span></footer><a class=entry-link aria-label="post link to Customer lifetime value and the proliferation of misinformation on the internet" href=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Ask Why! Finding motives, causes, and purpose in data science</h2></header><div class=entry-content><p>Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling.</p></div><footer class=entry-footer><span title='2016-09-19 21:28:44 +0000 UTC'>September 19, 2016</span></footer><a class=entry-link aria-label="post link to Ask Why! Finding motives, causes, and purpose in data science" href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>If you don’t pay attention, data can drive you off a cliff</h2></header><div class=entry-content><p>Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.</p></div><footer class=entry-footer><span title='2016-08-21 21:34:17 +0000 UTC'>August 21, 2016</span></footer><a class=entry-link aria-label="post link to If you don’t pay attention, data can drive you off a cliff" href=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is Data Scientist a useless job title?</h2></header><div class=entry-content><p>It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.</p></div><footer class=entry-footer><span title='2016-08-04 22:26:03 +0000 UTC'>August 4, 2016</span></footer><a class=entry-link aria-label="post link to Is Data Scientist a useless job title?" href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Making Bayesian A/B testing more accessible</h2></header><div class=entry-content><p>A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.</p></div><footer class=entry-footer><span title='2016-06-19 10:32:15 +0000 UTC'>June 19, 2016</span></footer><a class=entry-link aria-label="post link to Making Bayesian A/B testing more accessible" href=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions</h2></header><div class=entry-content><p>Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time.</p></div><footer class=entry-footer><span title='2016-05-14 19:57:03 +0000 UTC'>May 14, 2016</span></footer><a class=entry-link aria-label="post link to Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions" href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The rise of greedy robots</h2></header><div class=entry-content><p>Is artificial/machine intelligence a future threat? I argue that it’s already here, with greedy robots already dominating our lives.</p></div><footer class=entry-footer><span title='2016-03-20 20:33:43 +0000 UTC'>March 20, 2016</span></footer><a class=entry-link aria-label="post link to The rise of greedy robots" href=https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Why you should stop worrying about deep learning and deepen your understanding of causality instead</h2></header><div class=entry-content><p>Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.</p></div><footer class=entry-footer><span title='2016-02-14 11:04:11 +0000 UTC'>February 14, 2016</span></footer><a class=entry-link aria-label="post link to Why you should stop worrying about deep learning and deepen your understanding of causality instead" href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The joys of offline data collection</h2></header><div class=entry-content><p>Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.</p></div><footer class=entry-footer><span title='2016-01-24 00:32:25 +0000 UTC'>January 24, 2016</span></footer><a class=entry-link aria-label="post link to The joys of offline data collection" href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>This holiday season, give me real insights</h2></header><div class=entry-content><p>Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights.</p></div><footer class=entry-footer><span title='2015-12-08 06:57:25 +0000 UTC'>December 8, 2015</span></footer><a class=entry-link aria-label="post link to This holiday season, give me real insights" href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The hardest parts of data science</h2></header><div class=entry-content><p>Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.</p></div><footer class=entry-footer><span title='2015-11-23 04:14:21 +0000 UTC'>November 23, 2015</span></footer><a class=entry-link aria-label="post link to The hardest parts of data science" href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling</h2></header><div class=entry-content><p>Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.</p></div><footer class=entry-footer><span title='2015-10-19 00:02:32 +0000 UTC'>October 19, 2015</span></footer><a class=entry-link aria-label="post link to Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling" href=https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The wonderful world of recommender systems</h2></header><div class=entry-content><p>Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.</p></div><footer class=entry-footer><span title='2015-10-02 05:25:57 +0000 UTC'>October 2, 2015</span></footer><a class=entry-link aria-label="post link to The wonderful world of recommender systems" href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>You don’t need a data scientist (yet)</h2></header><div class=entry-content><p>Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist.</p></div><footer class=entry-footer><span title='2015-08-24 08:25:30 +0000 UTC'>August 24, 2015</span></footer><a class=entry-link aria-label="post link to You don’t need a data scientist (yet)" href=https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Learning about deep learning through album cover classification</h2></header><div class=entry-content><p>Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.</p></div><footer class=entry-footer><span title='2015-07-06 22:21:42 +0000 UTC'>July 6, 2015</span></footer><a class=entry-link aria-label="post link to Learning about deep learning through album cover classification" href=https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Hopping on the deep learning bandwagon</h2></header><div class=entry-content><p>To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.</p></div><footer class=entry-footer><span title='2015-06-06 05:00:22 +0000 UTC'>June 6, 2015</span></footer><a class=entry-link aria-label="post link to Hopping on the deep learning bandwagon" href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>First steps in data science: author-aware sentiment analysis</h2></header><div class=entry-content><p>I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.</p></div><footer class=entry-footer><span title='2015-05-02 08:31:10 +0000 UTC'>May 2, 2015</span></footer><a class=entry-link aria-label="post link to First steps in data science: author-aware sentiment analysis" href=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My PhD work</h2></header><div class=entry-content><p>An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.</p></div><footer class=entry-footer><span title='2015-03-30 03:23:33 +0000 UTC'>March 30, 2015</span></footer><a class=entry-link aria-label="post link to My PhD work" href=https://yanirseroussi.com/phd-work/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The long road to a lifestyle business</h2></header><div class=entry-content><p>Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects.</p></div><footer class=entry-footer><span title='2015-03-22 09:43:47 +0000 UTC'>March 22, 2015</span></footer><a class=entry-link aria-label="post link to The long road to a lifestyle business" href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)</h2></header><div class=entry-content><p>My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).</p></div><footer class=entry-footer><span title='2015-02-11 06:34:17 +0000 UTC'>February 11, 2015</span></footer><a class=entry-link aria-label="post link to Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)" href=https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)</h2></header><div class=entry-content><p>Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).</p></div><footer class=entry-footer><span title='2015-01-29 10:37:39 +0000 UTC'>January 29, 2015</span></footer><a class=entry-link aria-label="post link to Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)" href=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Stochastic Gradient Boosting: Choosing the Best Number of Iterations</h2></header><div class=entry-content><p>Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.</p></div><footer class=entry-footer><span title='2014-12-29 02:30:06 +0000 UTC'>December 29, 2014</span></footer><a class=entry-link aria-label="post link to Stochastic Gradient Boosting: Choosing the Best Number of Iterations" href=https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)</h2></header><div class=entry-content><p>Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.</p></div><footer class=entry-footer><span title='2014-11-19 09:17:34 +0000 UTC'>November 19, 2014</span></footer><a class=entry-link aria-label="post link to Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)" href=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>What is data science?</h2></header><div class=entry-content><p>Data science has been a hot term in the past few years. Still, there isn’t a single definition of the field. This post discusses my favourite definition.</p></div><footer class=entry-footer><span title='2014-10-23 03:22:08 +0000 UTC'>October 23, 2014</span></footer><a class=entry-link aria-label="post link to What is data science?" href=https://yanirseroussi.com/2014/10/23/what-is-data-science/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Greek Media Monitoring Kaggle competition: My approach</h2></header><div class=entry-content><p>Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.</p></div><footer class=entry-footer><span title='2014-10-07 03:21:35 +0000 UTC'>October 7, 2014</span></footer><a class=entry-link aria-label="post link to Greek Media Monitoring Kaggle competition: My approach" href=https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Bandcamp recommendation and discovery algorithms</h2></header><div class=entry-content><p>The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.</p></div><footer class=entry-footer><span title='2014-09-19 14:26:55 +0000 UTC'>September 19, 2014</span></footer><a class=entry-link aria-label="post link to Bandcamp recommendation and discovery algorithms" href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>How to (almost) win Kaggle competitions</h2></header><div class=entry-content><p>Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.</p></div><footer class=entry-footer><span title='2014-08-24 12:40:53 +0000 UTC'>August 24, 2014</span></footer><a class=entry-link aria-label="post link to How to (almost) win Kaggle competitions" href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Data’s hierarchy of needs</h2></header><div class=entry-content><p>Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data.</p></div><footer class=entry-footer><span title='2014-08-17 13:09:30 +0000 UTC'>August 17, 2014</span></footer><a class=entry-link aria-label="post link to Data’s hierarchy of needs" href=https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Kaggle competition tips and summaries</h2></header><div class=entry-content><p>Pointers to all my Kaggle advice posts and competition summaries.</p></div><footer class=entry-footer><span title='2014-04-05 23:46:10 +0000 UTC'>April 5, 2014</span></footer><a class=entry-link aria-label="post link to Kaggle competition tips and summaries" href=https://yanirseroussi.com/kaggle/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Kaggle beginner tips</h2></header><div class=entry-content><p>First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions.</p></div><footer class=entry-footer><span title='2014-01-19 10:34:28 +0000 UTC'>January 19, 2014</span></footer><a class=entry-link aria-label="post link to Kaggle beginner tips" href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/data-strategy/index.html b/tags/data-strategy/index.html
index e292546c3..88dd06249 100644
--- a/tags/data-strategy/index.html
+++ b/tags/data-strategy/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Data Strategy | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/data-strategy/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/data-strategy/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/data-strategy/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Data Strategy"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/data-strategy/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Data Strategy"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/data-strategy/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/data-strategy/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/data-strategy/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Data Strategy"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/data-strategy/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Data Strategy"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Data Strategy</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is your tech stack ready for data-intensive applications?</h2></header><div class=entry-content><p>Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.</p></div><footer class=entry-footer><span title='2024-06-24 02:00:00 +0000 UTC'>June 24, 2024</span></footer><a class=entry-link aria-label="post link to Is your tech stack ready for data-intensive applications?" href=https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Dealing with endless data changes</h2></header><div class=entry-content><p>Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data.</p></div><footer class=entry-footer><span title='2024-06-22 22:50:00 +0000 UTC'>June 22, 2024</span></footer><a class=entry-link aria-label="post link to Dealing with endless data changes" href=https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>AI ain't gonna save you from bad data</h2></header><div class=entry-content><p>Since we’re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects.</p></div><footer class=entry-footer><span title='2024-06-17 02:00:00 +0000 UTC'>June 17, 2024</span></footer><a class=entry-link aria-label="post link to AI ain't gonna save you from bad data" href=https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Startup data health starts with healthy event tracking</h2></header><div class=entry-content><p>Expanding on the startup health check question of tracking Kukuyeva’s five business aspects as wide events.</p></div><footer class=entry-footer><span title='2024-06-10 04:00:00 +0000 UTC'>June 10, 2024</span></footer><a class=entry-link aria-label="post link to Startup data health starts with healthy event tracking" href=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>How to avoid startups with poor development processes</h2></header><div class=entry-content><p>Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.</p></div><footer class=entry-footer><span title='2024-06-03 02:45:00 +0000 UTC'>June 3, 2024</span></footer><a class=entry-link aria-label="post link to How to avoid startups with poor development processes" href=https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Plumbing, Decisions, and Automation: De-hyping Data & AI</h2></header><div class=entry-content><p>Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype).</p></div><footer class=entry-footer><span title='2024-05-27 02:00:00 +0000 UTC'>May 27, 2024</span></footer><a class=entry-link aria-label="post link to Plumbing, Decisions, and Automation: De-hyping Data & AI" href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Question startup culture before accepting a data-to-AI role</h2></header><div class=entry-content><p>Eight questions that prospective data-to-AI employees should ask about a startup’s work and data culture.</p></div><footer class=entry-footer><span title='2024-05-20 02:25:00 +0000 UTC'>May 20, 2024</span></footer><a class=entry-link aria-label="post link to Question startup culture before accepting a data-to-AI role" href=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Probing the People aspects of an early-stage startup</h2></header><div class=entry-content><p>Ten questions that prospective employees should ask about a startup’s team, especially for data-centric roles.</p></div><footer class=entry-footer><span title='2024-05-13 02:00:00 +0000 UTC'>May 13, 2024</span></footer><a class=entry-link aria-label="post link to Probing the People aspects of an early-stage startup" href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Business questions to ask before taking a startup data role</h2></header><div class=entry-content><p>Fourteen questions that prospective employees should ask about a startup’s business model and product, especially for data-focused roles.</p></div><footer class=entry-footer><span title='2024-05-06 04:30:00 +0000 UTC'>May 6, 2024</span></footer><a class=entry-link aria-label="post link to Business questions to ask before taking a startup data role" href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Assessing a startup's data-to-AI health</h2></header><div class=entry-content><p>Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.</p></div><footer class=entry-footer><span title='2024-04-22 06:00:00 +0000 UTC'>April 22, 2024</span></footer><a class=entry-link aria-label="post link to Assessing a startup's data-to-AI health" href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My experience as a Data Tech Lead with Work on Climate</h2></header><div class=entry-content><p>The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.</p></div><footer class=entry-footer><span title='2024-04-08 02:00:00 +0000 UTC'>April 8, 2024</span></footer><a class=entry-link aria-label="post link to My experience as a Data Tech Lead with Work on Climate" href=https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Two types of startup data problems</h2></header><div class=entry-content><p>Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they’re likely to face.</p></div><footer class=entry-footer><span title='2024-03-04 02:00:00 +0000 UTC'>March 4, 2024</span></footer><a class=entry-link aria-label="post link to Two types of startup data problems" href=https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Avoiding AI complexity: First, write no code</h2></header><div class=entry-content><p>Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.</p></div><footer class=entry-footer><span title='2024-02-26 01:45:00 +0000 UTC'>February 26, 2024</span></footer><a class=entry-link aria-label="post link to Avoiding AI complexity: First, write no code" href=https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building your startup's minimum viable data stack</h2></header><div class=entry-content><p>First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations.</p></div><footer class=entry-footer><span title='2024-02-19 00:00:00 +0000 UTC'>February 19, 2024</span></footer><a class=entry-link aria-label="post link to Building your startup's minimum viable data stack" href=https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/data-visualisation/index.html b/tags/data-visualisation/index.html
index 0e11483a4..a0753c79e 100644
--- a/tags/data-visualisation/index.html
+++ b/tags/data-visualisation/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Data Visualisation | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/data-visualisation/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/data-visualisation/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/data-visualisation/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Data Visualisation"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/data-visualisation/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Data Visualisation"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/data-visualisation/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/data-visualisation/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/data-visualisation/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Data Visualisation"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/data-visualisation/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Data Visualisation"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Data Visualisation</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Supporting volunteer monitoring of marine biodiversity with modern web and data tools</h2></header><div class=entry-content><p>Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.</p></div><footer class=entry-footer><span title='2023-11-29 02:00:00 +0000 UTC'>November 29, 2023</span></footer><a class=entry-link aria-label="post link to Supporting volunteer monitoring of marine biodiversity with modern web and data tools" href=https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The rule of thirds can probably be ignored</h2></header><div class=entry-content><p>Turns out that the rule of thirds for composing visuals may not be that important.</p></div><footer class=entry-footer><span title='2023-08-11 03:15:00 +0000 UTC'>August 11, 2023</span></footer><a class=entry-link aria-label="post link to The rule of thirds can probably be ignored" href=https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/deep-learning/index.html b/tags/deep-learning/index.html
index 823b48991..3618048b2 100644
--- a/tags/deep-learning/index.html
+++ b/tags/deep-learning/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Deep Learning | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/deep-learning/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/deep-learning/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/deep-learning/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Deep Learning"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/deep-learning/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Deep Learning"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/deep-learning/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/deep-learning/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/deep-learning/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Deep Learning"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/deep-learning/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Deep Learning"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Deep Learning</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building useful machine learning tools keeps getting easier: A fish ID case study</h2></header><div class=entry-content><p>Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.</p></div><footer class=entry-footer><span title='2022-03-20 04:30:00 +0000 UTC'>March 20, 2022</span></footer><a class=entry-link aria-label="post link to Building useful machine learning tools keeps getting easier: A fish ID case study" href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Use your human brain to avoid artificial intelligence disasters</h2></header><div class=entry-content><p>Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.</p></div><footer class=entry-footer><span title='2021-11-22 03:45:00 +0000 UTC'>November 22, 2021</span></footer><a class=entry-link aria-label="post link to Use your human brain to avoid artificial intelligence disasters" href=https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The rise of greedy robots</h2></header><div class=entry-content><p>Is artificial/machine intelligence a future threat? I argue that it’s already here, with greedy robots already dominating our lives.</p></div><footer class=entry-footer><span title='2016-03-20 20:33:43 +0000 UTC'>March 20, 2016</span></footer><a class=entry-link aria-label="post link to The rise of greedy robots" href=https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Why you should stop worrying about deep learning and deepen your understanding of causality instead</h2></header><div class=entry-content><p>Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.</p></div><footer class=entry-footer><span title='2016-02-14 11:04:11 +0000 UTC'>February 14, 2016</span></footer><a class=entry-link aria-label="post link to Why you should stop worrying about deep learning and deepen your understanding of causality instead" href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The joys of offline data collection</h2></header><div class=entry-content><p>Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.</p></div><footer class=entry-footer><span title='2016-01-24 00:32:25 +0000 UTC'>January 24, 2016</span></footer><a class=entry-link aria-label="post link to The joys of offline data collection" href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Learning about deep learning through album cover classification</h2></header><div class=entry-content><p>Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.</p></div><footer class=entry-footer><span title='2015-07-06 22:21:42 +0000 UTC'>July 6, 2015</span></footer><a class=entry-link aria-label="post link to Learning about deep learning through album cover classification" href=https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Hopping on the deep learning bandwagon</h2></header><div class=entry-content><p>To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.</p></div><footer class=entry-footer><span title='2015-06-06 05:00:22 +0000 UTC'>June 6, 2015</span></footer><a class=entry-link aria-label="post link to Hopping on the deep learning bandwagon" href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/devops/index.html b/tags/devops/index.html
index 4d9bceed3..59d1ab851 100644
--- a/tags/devops/index.html
+++ b/tags/devops/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>DevOps | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/devops/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/devops/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/devops/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="DevOps"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/devops/"><meta name=twitter:card content="summary"><meta name=twitter:title content="DevOps"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/devops/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/devops/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/devops/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="DevOps"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/devops/"><meta name=twitter:card content="summary"><meta name=twitter:title content="DevOps"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>DevOps</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Dealing with endless data changes</h2></header><div class=entry-content><p>Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data.</p></div><footer class=entry-footer><span title='2024-06-22 22:50:00 +0000 UTC'>June 22, 2024</span></footer><a class=entry-link aria-label="post link to Dealing with endless data changes" href=https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Using YubiKey for SSH access</h2></header><div class=entry-content><p>Some pointers for setting up SSH access with YubiKey on Ubuntu 22.04.</p></div><footer class=entry-footer><span title='2023-07-23 00:07:15 +0000 UTC'>July 23, 2023</span></footer><a class=entry-link aria-label="post link to Using YubiKey for SSH access" href=https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Migrating a simple web application from MongoDB to Elasticsearch</h2></header><div class=entry-content><p>Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.</p></div><footer class=entry-footer><span title='2015-11-04 03:53:18 +0000 UTC'>November 4, 2015</span></footer><a class=entry-link aria-label="post link to Migrating a simple web application from MongoDB to Elasticsearch" href=https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Goodbye, Parse.com</h2></header><div class=entry-content><p>Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution.</p></div><footer class=entry-footer><span title='2015-07-31 03:29:50 +0000 UTC'>July 31, 2015</span></footer><a class=entry-link aria-label="post link to Goodbye, Parse.com" href=https://yanirseroussi.com/2015/07/31/goodbye-parse-com/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Automating Parse.com bulk data imports</h2></header><div class=entry-content><p>A script for importing data into the Parse backend-as-a-service.</p></div><footer class=entry-footer><span title='2015-01-15 04:41:16 +0000 UTC'>January 15, 2015</span></footer><a class=entry-link aria-label="post link to Automating Parse.com bulk data imports" href=https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)</h2></header><div class=entry-content><p>Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service.</p></div><footer class=entry-footer><span title='2014-09-07 10:48:44 +0000 UTC'>September 7, 2014</span></footer><a class=entry-link aria-label="post link to Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)" href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/divestment/index.html b/tags/divestment/index.html
index 10b74f5bb..f41d11d51 100644
--- a/tags/divestment/index.html
+++ b/tags/divestment/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Divestment | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/divestment/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/divestment/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/divestment/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Divestment"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/divestment/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Divestment"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/divestment/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/divestment/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/divestment/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Divestment"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/divestment/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Divestment"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Divestment</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My divestment from fossil fuels</h2></header><div class=entry-content><p>Recent choices I’ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons.</p></div><footer class=entry-footer><span title='2015-04-24 00:19:36 +0000 UTC'>April 24, 2015</span></footer><a class=entry-link aria-label="post link to My divestment from fossil fuels" href=https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/economics/index.html b/tags/economics/index.html
index 51b9df163..535963482 100644
--- a/tags/economics/index.html
+++ b/tags/economics/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Economics | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/economics/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/economics/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/economics/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Economics"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/economics/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Economics"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/economics/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/economics/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/economics/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Economics"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/economics/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Economics"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Economics</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The rise of greedy robots</h2></header><div class=entry-content><p>Is artificial/machine intelligence a future threat? I argue that it’s already here, with greedy robots already dominating our lives.</p></div><footer class=entry-footer><span title='2016-03-20 20:33:43 +0000 UTC'>March 20, 2016</span></footer><a class=entry-link aria-label="post link to The rise of greedy robots" href=https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/elasticsearch/index.html b/tags/elasticsearch/index.html
index 41a168517..017eb6378 100644
--- a/tags/elasticsearch/index.html
+++ b/tags/elasticsearch/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Elasticsearch | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/elasticsearch/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/elasticsearch/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/elasticsearch/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Elasticsearch"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/elasticsearch/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Elasticsearch"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/elasticsearch/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/elasticsearch/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/elasticsearch/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Elasticsearch"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/elasticsearch/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Elasticsearch"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Elasticsearch</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My 10-step path to becoming a remote data scientist with Automattic</h2></header><div class=entry-content><p>I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.</p></div><footer class=entry-footer><span title='2017-07-29 05:39:26 +0000 UTC'>July 29, 2017</span></footer><a class=entry-link aria-label="post link to My 10-step path to becoming a remote data scientist with Automattic" href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Migrating a simple web application from MongoDB to Elasticsearch</h2></header><div class=entry-content><p>Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.</p></div><footer class=entry-footer><span title='2015-11-04 03:53:18 +0000 UTC'>November 4, 2015</span></footer><a class=entry-link aria-label="post link to Migrating a simple web application from MongoDB to Elasticsearch" href=https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/energy-markets/index.html b/tags/energy-markets/index.html
index 01511cdaa..2f3fe541e 100644
--- a/tags/energy-markets/index.html
+++ b/tags/energy-markets/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Energy Markets | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/energy-markets/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/energy-markets/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/energy-markets/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Energy Markets"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/energy-markets/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Energy Markets"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/energy-markets/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/energy-markets/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/energy-markets/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Energy Markets"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/energy-markets/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Energy Markets"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Energy Markets</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Transfer learning applies to energy market bidding</h2></header><div class=entry-content><p>An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland.</p></div><footer class=entry-footer><span title='2023-12-14 00:15:00 +0000 UTC'>December 14, 2023</span></footer><a class=entry-link aria-label="post link to Transfer learning applies to energy market bidding" href=https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/environment/index.html b/tags/environment/index.html
index 584cca36a..174a7d212 100644
--- a/tags/environment/index.html
+++ b/tags/environment/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Environment | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/environment/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/environment/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/environment/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Environment"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/environment/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Environment"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/environment/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/environment/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/environment/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Environment"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/environment/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Environment"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Environment</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My experience as a Data Tech Lead with Work on Climate</h2></header><div class=entry-content><p>The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.</p></div><footer class=entry-footer><span title='2024-04-08 02:00:00 +0000 UTC'>April 8, 2024</span></footer><a class=entry-link aria-label="post link to My experience as a Data Tech Lead with Work on Climate" href=https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>New decade, new tagline: Data & AI for Impact</h2></header><div class=entry-content><p>Shifting focus to ‘Data & AI for Impact’, with more startup-related content, increased posting frequency, and deeper audience engagement.</p></div><footer class=entry-footer><span title='2024-01-19 00:00:00 +0000 UTC'>January 19, 2024</span></footer><a class=entry-link aria-label="post link to New decade, new tagline: Data & AI for Impact" href=https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Psychographic specialisations may work for discipline generalists</h2></header><div class=entry-content><p>When focusing on a market segment defined by personal beliefs, it’s often fine to position yourself as a generalist in your craft.</p></div><footer class=entry-footer><span title='2024-01-09 03:00:00 +0000 UTC'>January 9, 2024</span></footer><a class=entry-link aria-label="post link to Psychographic specialisations may work for discipline generalists" href=https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Our Blue Machine is changing, but we are not helpless</h2></header><div class=entry-content><p>One of my many highlights from Helen Czerski’s Blue Machine.</p></div><footer class=entry-footer><span title='2023-11-28 06:40:00 +0000 UTC'>November 28, 2023</span></footer><a class=entry-link aria-label="post link to Our Blue Machine is changing, but we are not helpless" href=https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The mission matters: Moving to climate tech as a data scientist</h2></header><div class=entry-content><p>Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.</p></div><footer class=entry-footer><span title='2022-06-06 00:00:00 +0000 UTC'>June 6, 2022</span></footer><a class=entry-link aria-label="post link to The mission matters: Moving to climate tech as a data scientist" href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My work with Automattic</h2></header><div class=entry-content><p>Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.</p></div><footer class=entry-footer><span title='2021-10-07 00:00:00 +0000 UTC'>October 7, 2021</span></footer><a class=entry-link aria-label="post link to My work with Automattic" href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Some highlights from 2020</h2></header><div class=entry-content><p>Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform.</p></div><footer class=entry-footer><span title='2021-04-05 06:41:48 +0000 UTC'>April 5, 2021</span></footer><a class=entry-link aria-label="post link to Some highlights from 2020" href=https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Exploring and visualising Reef Life Survey data</h2></header><div class=entry-content><p>Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.</p></div><footer class=entry-footer><span title='2017-06-03 00:49:05 +0000 UTC'>June 3, 2017</span></footer><a class=entry-link aria-label="post link to Exploring and visualising Reef Life Survey data" href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The joys of offline data collection</h2></header><div class=entry-content><p>Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.</p></div><footer class=entry-footer><span title='2016-01-24 00:32:25 +0000 UTC'>January 24, 2016</span></footer><a class=entry-link aria-label="post link to The joys of offline data collection" href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My divestment from fossil fuels</h2></header><div class=entry-content><p>Recent choices I’ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons.</p></div><footer class=entry-footer><span title='2015-04-24 00:19:36 +0000 UTC'>April 24, 2015</span></footer><a class=entry-link aria-label="post link to My divestment from fossil fuels" href=https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/ethics/index.html b/tags/ethics/index.html
index a1c068179..e472fb6b5 100644
--- a/tags/ethics/index.html
+++ b/tags/ethics/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Ethics | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/ethics/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/ethics/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/ethics/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Ethics"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/ethics/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Ethics"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/ethics/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/ethics/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/ethics/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Ethics"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/ethics/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Ethics"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Ethics</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Artificial intelligence was a marketing term all along – just call it automation</h2></header><div class=entry-content><p>Replacing ‘artificial intelligence’ with ‘automation’ is a useful trick for cutting through the hype.</p></div><footer class=entry-footer><span title='2023-10-06 05:00:00 +0000 UTC'>October 6, 2023</span></footer><a class=entry-link aria-label="post link to Artificial intelligence was a marketing term all along – just call it automation" href=https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Use your human brain to avoid artificial intelligence disasters</h2></header><div class=entry-content><p>Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.</p></div><footer class=entry-footer><span title='2021-11-22 03:45:00 +0000 UTC'>November 22, 2021</span></footer><a class=entry-link aria-label="post link to Use your human brain to avoid artificial intelligence disasters" href=https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/facebook/index.html b/tags/facebook/index.html
index e07cc9fa5..0eceee26e 100644
--- a/tags/facebook/index.html
+++ b/tags/facebook/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Facebook | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/facebook/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/facebook/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/facebook/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Facebook"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/facebook/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Facebook"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/facebook/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/facebook/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/facebook/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Facebook"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/facebook/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Facebook"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Facebook</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>This holiday season, give me real insights</h2></header><div class=entry-content><p>Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights.</p></div><footer class=entry-footer><span title='2015-12-08 06:57:25 +0000 UTC'>December 8, 2015</span></footer><a class=entry-link aria-label="post link to This holiday season, give me real insights" href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/fast.ai/index.html b/tags/fast.ai/index.html
index 365ff3911..59e93ad72 100644
--- a/tags/fast.ai/index.html
+++ b/tags/fast.ai/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Fast.ai | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/fast.ai/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/fast.ai/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/fast.ai/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Fast.ai"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/fast.ai/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Fast.ai"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/fast.ai/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/fast.ai/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/fast.ai/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Fast.ai"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/fast.ai/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Fast.ai"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Fast.ai</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building useful machine learning tools keeps getting easier: A fish ID case study</h2></header><div class=entry-content><p>Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.</p></div><footer class=entry-footer><span title='2022-03-20 04:30:00 +0000 UTC'>March 20, 2022</span></footer><a class=entry-link aria-label="post link to Building useful machine learning tools keeps getting easier: A fish ID case study" href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Use your human brain to avoid artificial intelligence disasters</h2></header><div class=entry-content><p>Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.</p></div><footer class=entry-footer><span title='2021-11-22 03:45:00 +0000 UTC'>November 22, 2021</span></footer><a class=entry-link aria-label="post link to Use your human brain to avoid artificial intelligence disasters" href=https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/fossil-fuels/index.html b/tags/fossil-fuels/index.html
index adceb456c..1e5e201f3 100644
--- a/tags/fossil-fuels/index.html
+++ b/tags/fossil-fuels/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Fossil Fuels | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/fossil-fuels/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/fossil-fuels/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/fossil-fuels/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Fossil Fuels"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/fossil-fuels/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Fossil Fuels"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/fossil-fuels/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/fossil-fuels/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/fossil-fuels/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Fossil Fuels"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/fossil-fuels/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Fossil Fuels"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Fossil Fuels</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My divestment from fossil fuels</h2></header><div class=entry-content><p>Recent choices I’ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons.</p></div><footer class=entry-footer><span title='2015-04-24 00:19:36 +0000 UTC'>April 24, 2015</span></footer><a class=entry-link aria-label="post link to My divestment from fossil fuels" href=https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/frequently-asked-questions/index.html b/tags/frequently-asked-questions/index.html
index f287dfa6c..db153ca1d 100644
--- a/tags/frequently-asked-questions/index.html
+++ b/tags/frequently-asked-questions/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Frequently Asked Questions | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/frequently-asked-questions/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/frequently-asked-questions/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/frequently-asked-questions/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Frequently Asked Questions"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/frequently-asked-questions/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Frequently Asked Questions"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/frequently-asked-questions/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/frequently-asked-questions/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/frequently-asked-questions/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Frequently Asked Questions"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/frequently-asked-questions/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Frequently Asked Questions"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Frequently Asked Questions</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Advice for aspiring data scientists and other FAQs</h2></header><div class=entry-content><p>Frequently asked questions by visitors to this site, especially around entering the data science field.</p></div><footer class=entry-footer><span title='2017-10-15 09:15:25 +0000 UTC'>October 15, 2017</span></footer><a class=entry-link aria-label="post link to Advice for aspiring data scientists and other FAQs" href=https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/futurism/index.html b/tags/futurism/index.html
index fcfd5201b..c1f0c1560 100644
--- a/tags/futurism/index.html
+++ b/tags/futurism/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Futurism | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/futurism/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/futurism/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/futurism/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Futurism"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/futurism/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Futurism"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/futurism/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/futurism/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/futurism/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Futurism"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/futurism/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Futurism"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Futurism</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Adapting to the economy of algorithms</h2></header><div class=entry-content><p>Overview of the book The Economy of Algorithms by Marek Kowalkiewicz.</p></div><footer class=entry-footer><span title='2024-05-25 00:00:00 +0000 UTC'>May 25, 2024</span></footer><a class=entry-link aria-label="post link to Adapting to the economy of algorithms" href=https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Remaining relevant as a small language model</h2></header><div class=entry-content><p>Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now).</p></div><footer class=entry-footer><span title='2023-04-21 00:06:30 +0000 UTC'>April 21, 2023</span></footer><a class=entry-link aria-label="post link to Remaining relevant as a small language model" href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>ChatGPT is transformative AI</h2></header><div class=entry-content><p>My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it’s only the beginning.</p></div><footer class=entry-footer><span title='2022-12-11 00:00:00 +0000 UTC'>December 11, 2022</span></footer><a class=entry-link aria-label="post link to ChatGPT is transformative AI" href=https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The rise of greedy robots</h2></header><div class=entry-content><p>Is artificial/machine intelligence a future threat? I argue that it’s already here, with greedy robots already dominating our lives.</p></div><footer class=entry-footer><span title='2016-03-20 20:33:43 +0000 UTC'>March 20, 2016</span></footer><a class=entry-link aria-label="post link to The rise of greedy robots" href=https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/github/index.html b/tags/github/index.html
index 683a728d8..85993f5fc 100644
--- a/tags/github/index.html
+++ b/tags/github/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>GitHub | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/github/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/github/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/github/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="GitHub"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/github/"><meta name=twitter:card content="summary"><meta name=twitter:title content="GitHub"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/github/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/github/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/github/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="GitHub"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/github/"><meta name=twitter:card content="summary"><meta name=twitter:title content="GitHub"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>GitHub</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Using YubiKey for SSH access</h2></header><div class=entry-content><p>Some pointers for setting up SSH access with YubiKey on Ubuntu 22.04.</p></div><footer class=entry-footer><span title='2023-07-23 00:07:15 +0000 UTC'>July 23, 2023</span></footer><a class=entry-link aria-label="post link to Using YubiKey for SSH access" href=https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Migrating from WordPress.com to Hugo on GitHub + Cloudflare</h2></header><div class=entry-content><p>My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.</p></div><footer class=entry-footer><span title='2021-11-10 06:30:00 +0000 UTC'>November 10, 2021</span></footer><a class=entry-link aria-label="post link to Migrating from WordPress.com to Hugo on GitHub + Cloudflare" href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/gradient-boosting/index.html b/tags/gradient-boosting/index.html
index 574ba1ce6..82aa0ee5d 100644
--- a/tags/gradient-boosting/index.html
+++ b/tags/gradient-boosting/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Gradient Boosting | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/gradient-boosting/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/gradient-boosting/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/gradient-boosting/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Gradient Boosting"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/gradient-boosting/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Gradient Boosting"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/gradient-boosting/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/gradient-boosting/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/gradient-boosting/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Gradient Boosting"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/gradient-boosting/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Gradient Boosting"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Gradient Boosting</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)</h2></header><div class=entry-content><p>My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).</p></div><footer class=entry-footer><span title='2015-02-11 06:34:17 +0000 UTC'>February 11, 2015</span></footer><a class=entry-link aria-label="post link to Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)" href=https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Stochastic Gradient Boosting: Choosing the Best Number of Iterations</h2></header><div class=entry-content><p>Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.</p></div><footer class=entry-footer><span title='2014-12-29 02:30:06 +0000 UTC'>December 29, 2014</span></footer><a class=entry-link aria-label="post link to Stochastic Gradient Boosting: Choosing the Best Number of Iterations" href=https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)</h2></header><div class=entry-content><p>Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.</p></div><footer class=entry-footer><span title='2014-11-19 09:17:34 +0000 UTC'>November 19, 2014</span></footer><a class=entry-link aria-label="post link to Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)" href=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/hackers/index.html b/tags/hackers/index.html
index 9fc988918..c24121a4e 100644
--- a/tags/hackers/index.html
+++ b/tags/hackers/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Hackers | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/hackers/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/hackers/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/hackers/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Hackers"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/hackers/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Hackers"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/hackers/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/hackers/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/hackers/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Hackers"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/hackers/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Hackers"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Hackers</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>How hackable are automated coding assessments?</h2></header><div class=entry-content><p>Exploring the hackability of speed-based coding tests, using CodeSignal’s Industry Coding Framework as a case study.</p></div><footer class=entry-footer><span title='2023-05-26 00:03:00 +0000 UTC'>May 26, 2023</span></footer><a class=entry-link aria-label="post link to How hackable are automated coding assessments?" href=https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Hackers beware: Bootstrap sampling may be harmful</h2></header><div class=entry-content><p>Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren’t that simple.</p></div><footer class=entry-footer><span title='2019-01-07 21:07:56 +0000 UTC'>January 7, 2019</span></footer><a class=entry-link aria-label="post link to Hackers beware: Bootstrap sampling may be harmful" href=https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/health/index.html b/tags/health/index.html
index 5207d7b05..45f12c403 100644
--- a/tags/health/index.html
+++ b/tags/health/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Health | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/health/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/health/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/health/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Health"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/health/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Health"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/health/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/health/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/health/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Health"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/health/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Health"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Health</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling</h2></header><div class=entry-content><p>Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.</p></div><footer class=entry-footer><span title='2015-10-19 00:02:32 +0000 UTC'>October 19, 2015</span></footer><a class=entry-link aria-label="post link to Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling" href=https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/hugo/index.html b/tags/hugo/index.html
index 7ccaf7394..0aeeb06ed 100644
--- a/tags/hugo/index.html
+++ b/tags/hugo/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Hugo | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/hugo/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/hugo/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/hugo/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Hugo"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/hugo/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Hugo"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/hugo/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/hugo/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/hugo/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Hugo"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/hugo/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Hugo"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Hugo</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Making a TIL section with Hugo and PaperMod</h2></header><div class=entry-content><p>How I added a Today I Learned section to my Hugo site with the PaperMod theme.</p></div><footer class=entry-footer><span title='2023-07-17 00:06:15 +0000 UTC'>July 17, 2023</span></footer><a class=entry-link aria-label="post link to Making a TIL section with Hugo and PaperMod" href=https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Migrating from WordPress.com to Hugo on GitHub + Cloudflare</h2></header><div class=entry-content><p>My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.</p></div><footer class=entry-footer><span title='2021-11-10 06:30:00 +0000 UTC'>November 10, 2021</span></footer><a class=entry-link aria-label="post link to Migrating from WordPress.com to Hugo on GitHub + Cloudflare" href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/index.html b/tags/index.html
index 9849d2ac3..92da1a7b4 100644
--- a/tags/index.html
+++ b/tags/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Tags | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Tags"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Tags"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Tags"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Tags"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Tags</h1></header><ul class=terms-tags><li><a href=https://yanirseroussi.com/tags/analytics/>analytics <sup><strong><sup>11</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>artificial intelligence <sup><strong><sup>25</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/automattic/>Automattic <sup><strong><sup>5</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/bandcamp/>Bandcamp <sup><strong><sup>7</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/bcrecommender/>BCRecommender <sup><strong><sup>9</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/blogging/>blogging <sup><strong><sup>3</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/books/>books <sup><strong><sup>9</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/bootstrapping/>bootstrapping <sup><strong><sup>2</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/business/>business <sup><strong><sup>31</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/career/>career <sup><strong><sup>35</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/causal-inference/>causal inference <sup><strong><sup>9</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/climate-change/>climate change <sup><strong><sup>4</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/cloudflare/>Cloudflare <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/confidence-intervals/>confidence intervals <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/consulting/>consulting <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/data-business/>data business <sup><strong><sup>5</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/data-engineering/>data engineering <sup><strong><sup>8</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/data-science/>data science <sup><strong><sup>61</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>data strategy <sup><strong><sup>14</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/data-visualisation/>data visualisation <sup><strong><sup>2</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/deep-learning/>deep learning <sup><strong><sup>7</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/devops/>DevOps <sup><strong><sup>6</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/divestment/>divestment <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/economics/>economics <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/elasticsearch/>Elasticsearch <sup><strong><sup>2</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/energy-markets/>energy markets <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/environment/>environment <sup><strong><sup>10</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/ethics/>ethics <sup><strong><sup>2</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/facebook/>Facebook <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/fast.ai/>fast.ai <sup><strong><sup>2</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/fossil-fuels/>fossil fuels <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/frequently-asked-questions/>frequently asked questions <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/futurism/>futurism <sup><strong><sup>4</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/github/>GitHub <sup><strong><sup>2</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/gradient-boosting/>gradient boosting <sup><strong><sup>3</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/hackers/>hackers <sup><strong><sup>2</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/health/>health <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/hugo/>Hugo <sup><strong><sup>2</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/insights/>insights <sup><strong><sup>4</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/javascript/>JavaScript <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/kaggle/>Kaggle <sup><strong><sup>9</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/kaggle-beginners/>Kaggle beginners <sup><strong><sup>2</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/kaggle-competition/>Kaggle competition <sup><strong><sup>6</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/linkedin/>LinkedIn <sup><strong><sup>2</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/machine-intelligence/>machine intelligence <sup><strong><sup>3</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>machine learning <sup><strong><sup>27</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/marine-science/>marine science <sup><strong><sup>6</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/marketing/>marketing <sup><strong><sup>15</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/mongodb/>MongoDB <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/multi-label-classification/>multi-label classification <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/music/>music <sup><strong><sup>3</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/music-industry/>music industry <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/nutrition/>nutrition <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/nutritionism/>nutritionism <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/orkestra/>Orkestra <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/parse.com/>parse.com <sup><strong><sup>2</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/personal/>personal <sup><strong><sup>17</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/phantomjs/>PhantomJS <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/politics/>politics <sup><strong><sup>2</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/predictive-modelling/>predictive modelling <sup><strong><sup>19</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/price-forecasting/>price forecasting <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/productivity/>productivity <sup><strong><sup>4</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/quotes/>quotes <sup><strong><sup>12</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/recommender-systems/>recommender systems <sup><strong><sup>5</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/reef-life-survey/>Reef Life Survey <sup><strong><sup>8</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/remote-work/>remote work <sup><strong><sup>6</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/science-communication/>science communication <sup><strong><sup>2</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/scikit-learn/>scikit-learn <sup><strong><sup>2</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/scuba-diving/>scuba diving <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/search-engine-optimisation/>search engine optimisation <sup><strong><sup>4</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/security/>security <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/sentiment-analysis/>sentiment analysis <sup><strong><sup>1</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>software engineering <sup><strong><sup>29</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/split-testing/>split testing <sup><strong><sup>3</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/startups/>startups <sup><strong><sup>16</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/statistics/>statistics <sup><strong><sup>10</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/sustainability/>sustainability <sup><strong><sup>3</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/traction-book/>traction book <sup><strong><sup>3</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/web-development/>web development <sup><strong><sup>6</sup></strong></sup></a></li><li><a href=https://yanirseroussi.com/tags/wordpress/>WordPress <sup><strong><sup>4</sup></strong></sup></a></li></ul></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/insights/index.html b/tags/insights/index.html
index 41711f499..aad74b0ee 100644
--- a/tags/insights/index.html
+++ b/tags/insights/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Insights | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/insights/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/insights/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/insights/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Insights"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/insights/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Insights"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/insights/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/insights/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/insights/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Insights"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/insights/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Insights"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Insights</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Ask Why! Finding motives, causes, and purpose in data science</h2></header><div class=entry-content><p>Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling.</p></div><footer class=entry-footer><span title='2016-09-19 21:28:44 +0000 UTC'>September 19, 2016</span></footer><a class=entry-link aria-label="post link to Ask Why! Finding motives, causes, and purpose in data science" href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions</h2></header><div class=entry-content><p>Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time.</p></div><footer class=entry-footer><span title='2016-05-14 19:57:03 +0000 UTC'>May 14, 2016</span></footer><a class=entry-link aria-label="post link to Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions" href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Why you should stop worrying about deep learning and deepen your understanding of causality instead</h2></header><div class=entry-content><p>Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.</p></div><footer class=entry-footer><span title='2016-02-14 11:04:11 +0000 UTC'>February 14, 2016</span></footer><a class=entry-link aria-label="post link to Why you should stop worrying about deep learning and deepen your understanding of causality instead" href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>This holiday season, give me real insights</h2></header><div class=entry-content><p>Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights.</p></div><footer class=entry-footer><span title='2015-12-08 06:57:25 +0000 UTC'>December 8, 2015</span></footer><a class=entry-link aria-label="post link to This holiday season, give me real insights" href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/javascript/index.html b/tags/javascript/index.html
index dee5dacca..f94ef4541 100644
--- a/tags/javascript/index.html
+++ b/tags/javascript/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>JavaScript | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/javascript/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/javascript/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/javascript/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="JavaScript"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/javascript/"><meta name=twitter:card content="summary"><meta name=twitter:title content="JavaScript"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/javascript/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/javascript/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/javascript/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="JavaScript"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/javascript/"><meta name=twitter:card content="summary"><meta name=twitter:title content="JavaScript"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>JavaScript</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Exploring and visualising Reef Life Survey data</h2></header><div class=entry-content><p>Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.</p></div><footer class=entry-footer><span title='2017-06-03 00:49:05 +0000 UTC'>June 3, 2017</span></footer><a class=entry-link aria-label="post link to Exploring and visualising Reef Life Survey data" href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/kaggle-beginners/index.html b/tags/kaggle-beginners/index.html
index e4c069d63..3bcdfd2e7 100644
--- a/tags/kaggle-beginners/index.html
+++ b/tags/kaggle-beginners/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Kaggle Beginners | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/kaggle-beginners/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/kaggle-beginners/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/kaggle-beginners/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Kaggle Beginners"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/kaggle-beginners/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Kaggle Beginners"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/kaggle-beginners/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/kaggle-beginners/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/kaggle-beginners/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Kaggle Beginners"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/kaggle-beginners/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Kaggle Beginners"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Kaggle Beginners</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>How to (almost) win Kaggle competitions</h2></header><div class=entry-content><p>Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.</p></div><footer class=entry-footer><span title='2014-08-24 12:40:53 +0000 UTC'>August 24, 2014</span></footer><a class=entry-link aria-label="post link to How to (almost) win Kaggle competitions" href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Kaggle beginner tips</h2></header><div class=entry-content><p>First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions.</p></div><footer class=entry-footer><span title='2014-01-19 10:34:28 +0000 UTC'>January 19, 2014</span></footer><a class=entry-link aria-label="post link to Kaggle beginner tips" href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/kaggle-competition/index.html b/tags/kaggle-competition/index.html
index 4aefd956b..0049c4e4f 100644
--- a/tags/kaggle-competition/index.html
+++ b/tags/kaggle-competition/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Kaggle Competition | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/kaggle-competition/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/kaggle-competition/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/kaggle-competition/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Kaggle Competition"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/kaggle-competition/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Kaggle Competition"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/kaggle-competition/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/kaggle-competition/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/kaggle-competition/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Kaggle Competition"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/kaggle-competition/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Kaggle Competition"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Kaggle Competition</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)</h2></header><div class=entry-content><p>My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).</p></div><footer class=entry-footer><span title='2015-02-11 06:34:17 +0000 UTC'>February 11, 2015</span></footer><a class=entry-link aria-label="post link to Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)" href=https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)</h2></header><div class=entry-content><p>Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).</p></div><footer class=entry-footer><span title='2015-01-29 10:37:39 +0000 UTC'>January 29, 2015</span></footer><a class=entry-link aria-label="post link to Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)" href=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)</h2></header><div class=entry-content><p>Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.</p></div><footer class=entry-footer><span title='2014-11-19 09:17:34 +0000 UTC'>November 19, 2014</span></footer><a class=entry-link aria-label="post link to Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)" href=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Greek Media Monitoring Kaggle competition: My approach</h2></header><div class=entry-content><p>Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.</p></div><footer class=entry-footer><span title='2014-10-07 03:21:35 +0000 UTC'>October 7, 2014</span></footer><a class=entry-link aria-label="post link to Greek Media Monitoring Kaggle competition: My approach" href=https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>How to (almost) win Kaggle competitions</h2></header><div class=entry-content><p>Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.</p></div><footer class=entry-footer><span title='2014-08-24 12:40:53 +0000 UTC'>August 24, 2014</span></footer><a class=entry-link aria-label="post link to How to (almost) win Kaggle competitions" href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Kaggle competition tips and summaries</h2></header><div class=entry-content><p>Pointers to all my Kaggle advice posts and competition summaries.</p></div><footer class=entry-footer><span title='2014-04-05 23:46:10 +0000 UTC'>April 5, 2014</span></footer><a class=entry-link aria-label="post link to Kaggle competition tips and summaries" href=https://yanirseroussi.com/kaggle/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/kaggle/index.html b/tags/kaggle/index.html
index dc3aae83f..d6124ca11 100644
--- a/tags/kaggle/index.html
+++ b/tags/kaggle/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Kaggle | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/kaggle/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/kaggle/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/kaggle/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Kaggle"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/kaggle/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Kaggle"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/kaggle/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/kaggle/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/kaggle/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Kaggle"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/kaggle/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Kaggle"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Kaggle</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The hardest parts of data science</h2></header><div class=entry-content><p>Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.</p></div><footer class=entry-footer><span title='2015-11-23 04:14:21 +0000 UTC'>November 23, 2015</span></footer><a class=entry-link aria-label="post link to The hardest parts of data science" href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)</h2></header><div class=entry-content><p>My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).</p></div><footer class=entry-footer><span title='2015-02-11 06:34:17 +0000 UTC'>February 11, 2015</span></footer><a class=entry-link aria-label="post link to Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)" href=https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)</h2></header><div class=entry-content><p>Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).</p></div><footer class=entry-footer><span title='2015-01-29 10:37:39 +0000 UTC'>January 29, 2015</span></footer><a class=entry-link aria-label="post link to Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)" href=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)</h2></header><div class=entry-content><p>Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.</p></div><footer class=entry-footer><span title='2014-11-19 09:17:34 +0000 UTC'>November 19, 2014</span></footer><a class=entry-link aria-label="post link to Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)" href=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>What is data science?</h2></header><div class=entry-content><p>Data science has been a hot term in the past few years. Still, there isn’t a single definition of the field. This post discusses my favourite definition.</p></div><footer class=entry-footer><span title='2014-10-23 03:22:08 +0000 UTC'>October 23, 2014</span></footer><a class=entry-link aria-label="post link to What is data science?" href=https://yanirseroussi.com/2014/10/23/what-is-data-science/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Greek Media Monitoring Kaggle competition: My approach</h2></header><div class=entry-content><p>Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.</p></div><footer class=entry-footer><span title='2014-10-07 03:21:35 +0000 UTC'>October 7, 2014</span></footer><a class=entry-link aria-label="post link to Greek Media Monitoring Kaggle competition: My approach" href=https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>How to (almost) win Kaggle competitions</h2></header><div class=entry-content><p>Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.</p></div><footer class=entry-footer><span title='2014-08-24 12:40:53 +0000 UTC'>August 24, 2014</span></footer><a class=entry-link aria-label="post link to How to (almost) win Kaggle competitions" href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Kaggle competition tips and summaries</h2></header><div class=entry-content><p>Pointers to all my Kaggle advice posts and competition summaries.</p></div><footer class=entry-footer><span title='2014-04-05 23:46:10 +0000 UTC'>April 5, 2014</span></footer><a class=entry-link aria-label="post link to Kaggle competition tips and summaries" href=https://yanirseroussi.com/kaggle/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Kaggle beginner tips</h2></header><div class=entry-content><p>First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions.</p></div><footer class=entry-footer><span title='2014-01-19 10:34:28 +0000 UTC'>January 19, 2014</span></footer><a class=entry-link aria-label="post link to Kaggle beginner tips" href=https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/linkedin/index.html b/tags/linkedin/index.html
index 73229e483..879dd38f0 100644
--- a/tags/linkedin/index.html
+++ b/tags/linkedin/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>LinkedIn | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/linkedin/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/linkedin/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/linkedin/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="LinkedIn"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/linkedin/"><meta name=twitter:card content="summary"><meta name=twitter:title content="LinkedIn"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/linkedin/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/linkedin/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/linkedin/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="LinkedIn"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/linkedin/"><meta name=twitter:card content="summary"><meta name=twitter:title content="LinkedIn"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>LinkedIn</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>LinkedIn is a teachable skill</h2></header><div class=entry-content><p>An high-level overview of things I learned from Justin Welsh’s LinkedIn Operating System course.</p></div><footer class=entry-footer><span title='2024-04-11 01:45:25 +0000 UTC'>April 11, 2024</span></footer><a class=entry-link aria-label="post link to LinkedIn is a teachable skill" href=https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>This holiday season, give me real insights</h2></header><div class=entry-content><p>Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights.</p></div><footer class=entry-footer><span title='2015-12-08 06:57:25 +0000 UTC'>December 8, 2015</span></footer><a class=entry-link aria-label="post link to This holiday season, give me real insights" href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/machine-intelligence/index.html b/tags/machine-intelligence/index.html
index d1ed3226b..df6c6bbb8 100644
--- a/tags/machine-intelligence/index.html
+++ b/tags/machine-intelligence/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Machine Intelligence | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/machine-intelligence/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/machine-intelligence/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/machine-intelligence/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Machine Intelligence"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/machine-intelligence/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Machine Intelligence"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/machine-intelligence/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/machine-intelligence/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/machine-intelligence/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Machine Intelligence"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/machine-intelligence/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Machine Intelligence"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Machine Intelligence</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Remaining relevant as a small language model</h2></header><div class=entry-content><p>Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now).</p></div><footer class=entry-footer><span title='2023-04-21 00:06:30 +0000 UTC'>April 21, 2023</span></footer><a class=entry-link aria-label="post link to Remaining relevant as a small language model" href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>ChatGPT is transformative AI</h2></header><div class=entry-content><p>My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it’s only the beginning.</p></div><footer class=entry-footer><span title='2022-12-11 00:00:00 +0000 UTC'>December 11, 2022</span></footer><a class=entry-link aria-label="post link to ChatGPT is transformative AI" href=https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The rise of greedy robots</h2></header><div class=entry-content><p>Is artificial/machine intelligence a future threat? I argue that it’s already here, with greedy robots already dominating our lives.</p></div><footer class=entry-footer><span title='2016-03-20 20:33:43 +0000 UTC'>March 20, 2016</span></footer><a class=entry-link aria-label="post link to The rise of greedy robots" href=https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/machine-learning/index.html b/tags/machine-learning/index.html
index 887f5b774..69f241c92 100644
--- a/tags/machine-learning/index.html
+++ b/tags/machine-learning/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Machine Learning | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/machine-learning/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/machine-learning/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/machine-learning/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Machine Learning"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/machine-learning/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Machine Learning"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/machine-learning/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/machine-learning/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/machine-learning/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Machine Learning"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/machine-learning/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Machine Learning"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Machine Learning</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is your tech stack ready for data-intensive applications?</h2></header><div class=entry-content><p>Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.</p></div><footer class=entry-footer><span title='2024-06-24 02:00:00 +0000 UTC'>June 24, 2024</span></footer><a class=entry-link aria-label="post link to Is your tech stack ready for data-intensive applications?" href=https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Dealing with endless data changes</h2></header><div class=entry-content><p>Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data.</p></div><footer class=entry-footer><span title='2024-06-22 22:50:00 +0000 UTC'>June 22, 2024</span></footer><a class=entry-link aria-label="post link to Dealing with endless data changes" href=https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Assessing a startup's data-to-AI health</h2></header><div class=entry-content><p>Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.</p></div><footer class=entry-footer><span title='2024-04-22 06:00:00 +0000 UTC'>April 22, 2024</span></footer><a class=entry-link aria-label="post link to Assessing a startup's data-to-AI health" href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>AI does not obviate the need for testing and observability</h2></header><div class=entry-content><p>It’s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software.</p></div><footer class=entry-footer><span title='2024-04-15 05:00:00 +0000 UTC'>April 15, 2024</span></footer><a class=entry-link aria-label="post link to AI does not obviate the need for testing and observability" href=https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Artificial intelligence, automation, and the art of counting fish</h2></header><div class=entry-content><p>Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement.</p></div><footer class=entry-footer><span title='2024-04-01 06:00:00 +0000 UTC'>April 1, 2024</span></footer><a class=entry-link aria-label="post link to Artificial intelligence, automation, and the art of counting fish" href=https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Questions to consider when using AI for PDF data extraction</h2></header><div class=entry-content><p>Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.</p></div><footer class=entry-footer><span title='2024-03-11 00:00:00 +0000 UTC'>March 11, 2024</span></footer><a class=entry-link aria-label="post link to Questions to consider when using AI for PDF data extraction" href=https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Two types of startup data problems</h2></header><div class=entry-content><p>Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they’re likely to face.</p></div><footer class=entry-footer><span title='2024-03-04 02:00:00 +0000 UTC'>March 4, 2024</span></footer><a class=entry-link aria-label="post link to Two types of startup data problems" href=https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Avoiding AI complexity: First, write no code</h2></header><div class=entry-content><p>Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.</p></div><footer class=entry-footer><span title='2024-02-26 01:45:00 +0000 UTC'>February 26, 2024</span></footer><a class=entry-link aria-label="post link to Avoiding AI complexity: First, write no code" href=https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Transfer learning applies to energy market bidding</h2></header><div class=entry-content><p>An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland.</p></div><footer class=entry-footer><span title='2023-12-14 00:15:00 +0000 UTC'>December 14, 2023</span></footer><a class=entry-link aria-label="post link to Transfer learning applies to energy market bidding" href=https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Supporting volunteer monitoring of marine biodiversity with modern web and data tools</h2></header><div class=entry-content><p>Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.</p></div><footer class=entry-footer><span title='2023-11-29 02:00:00 +0000 UTC'>November 29, 2023</span></footer><a class=entry-link aria-label="post link to Supporting volunteer monitoring of marine biodiversity with modern web and data tools" href=https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Google's Rules of Machine Learning still apply in the age of large language models</h2></header><div class=entry-content><p>Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices.</p></div><footer class=entry-footer><span title='2023-09-21 21:30:00 +0000 UTC'>September 21, 2023</span></footer><a class=entry-link aria-label="post link to Google's Rules of Machine Learning still apply in the age of large language models" href=https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>ChatGPT is transformative AI</h2></header><div class=entry-content><p>My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it’s only the beginning.</p></div><footer class=entry-footer><span title='2022-12-11 00:00:00 +0000 UTC'>December 11, 2022</span></footer><a class=entry-link aria-label="post link to ChatGPT is transformative AI" href=https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Causal Machine Learning is off to a good start, despite some issues</h2></header><div class=entry-content><p>Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.</p></div><footer class=entry-footer><span title='2022-09-12 02:45:00 +0000 UTC'>September 12, 2022</span></footer><a class=entry-link aria-label="post link to Causal Machine Learning is off to a good start, despite some issues" href=https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building useful machine learning tools keeps getting easier: A fish ID case study</h2></header><div class=entry-content><p>Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.</p></div><footer class=entry-footer><span title='2022-03-20 04:30:00 +0000 UTC'>March 20, 2022</span></footer><a class=entry-link aria-label="post link to Building useful machine learning tools keeps getting easier: A fish ID case study" href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Use your human brain to avoid artificial intelligence disasters</h2></header><div class=entry-content><p>Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.</p></div><footer class=entry-footer><span title='2021-11-22 03:45:00 +0000 UTC'>November 22, 2021</span></footer><a class=entry-link aria-label="post link to Use your human brain to avoid artificial intelligence disasters" href=https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My work with Automattic</h2></header><div class=entry-content><p>Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.</p></div><footer class=entry-footer><span title='2021-10-07 00:00:00 +0000 UTC'>October 7, 2021</span></footer><a class=entry-link aria-label="post link to My work with Automattic" href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Defining data science in 2018</h2></header><div class=entry-content><p>Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.</p></div><footer class=entry-footer><span title='2018-07-22 08:27:43 +0000 UTC'>July 22, 2018</span></footer><a class=entry-link aria-label="post link to Defining data science in 2018" href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Why you should stop worrying about deep learning and deepen your understanding of causality instead</h2></header><div class=entry-content><p>Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.</p></div><footer class=entry-footer><span title='2016-02-14 11:04:11 +0000 UTC'>February 14, 2016</span></footer><a class=entry-link aria-label="post link to Why you should stop worrying about deep learning and deepen your understanding of causality instead" href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling</h2></header><div class=entry-content><p>Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.</p></div><footer class=entry-footer><span title='2015-10-19 00:02:32 +0000 UTC'>October 19, 2015</span></footer><a class=entry-link aria-label="post link to Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling" href=https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The wonderful world of recommender systems</h2></header><div class=entry-content><p>Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.</p></div><footer class=entry-footer><span title='2015-10-02 05:25:57 +0000 UTC'>October 2, 2015</span></footer><a class=entry-link aria-label="post link to The wonderful world of recommender systems" href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Learning about deep learning through album cover classification</h2></header><div class=entry-content><p>Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.</p></div><footer class=entry-footer><span title='2015-07-06 22:21:42 +0000 UTC'>July 6, 2015</span></footer><a class=entry-link aria-label="post link to Learning about deep learning through album cover classification" href=https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Hopping on the deep learning bandwagon</h2></header><div class=entry-content><p>To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.</p></div><footer class=entry-footer><span title='2015-06-06 05:00:22 +0000 UTC'>June 6, 2015</span></footer><a class=entry-link aria-label="post link to Hopping on the deep learning bandwagon" href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>First steps in data science: author-aware sentiment analysis</h2></header><div class=entry-content><p>I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.</p></div><footer class=entry-footer><span title='2015-05-02 08:31:10 +0000 UTC'>May 2, 2015</span></footer><a class=entry-link aria-label="post link to First steps in data science: author-aware sentiment analysis" href=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My PhD work</h2></header><div class=entry-content><p>An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.</p></div><footer class=entry-footer><span title='2015-03-30 03:23:33 +0000 UTC'>March 30, 2015</span></footer><a class=entry-link aria-label="post link to My PhD work" href=https://yanirseroussi.com/phd-work/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)</h2></header><div class=entry-content><p>My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).</p></div><footer class=entry-footer><span title='2015-02-11 06:34:17 +0000 UTC'>February 11, 2015</span></footer><a class=entry-link aria-label="post link to Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)" href=https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)</h2></header><div class=entry-content><p>Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).</p></div><footer class=entry-footer><span title='2015-01-29 10:37:39 +0000 UTC'>January 29, 2015</span></footer><a class=entry-link aria-label="post link to Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)" href=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Stochastic Gradient Boosting: Choosing the Best Number of Iterations</h2></header><div class=entry-content><p>Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.</p></div><footer class=entry-footer><span title='2014-12-29 02:30:06 +0000 UTC'>December 29, 2014</span></footer><a class=entry-link aria-label="post link to Stochastic Gradient Boosting: Choosing the Best Number of Iterations" href=https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/marine-science/index.html b/tags/marine-science/index.html
index fe34e2b95..d88dc4c18 100644
--- a/tags/marine-science/index.html
+++ b/tags/marine-science/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Marine Science | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/marine-science/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/marine-science/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/marine-science/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Marine Science"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/marine-science/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Marine Science"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/marine-science/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/marine-science/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/marine-science/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Marine Science"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/marine-science/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Marine Science"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Marine Science</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Artificial intelligence, automation, and the art of counting fish</h2></header><div class=entry-content><p>Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement.</p></div><footer class=entry-footer><span title='2024-04-01 06:00:00 +0000 UTC'>April 1, 2024</span></footer><a class=entry-link aria-label="post link to Artificial intelligence, automation, and the art of counting fish" href=https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Supporting volunteer monitoring of marine biodiversity with modern web and data tools</h2></header><div class=entry-content><p>Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.</p></div><footer class=entry-footer><span title='2023-11-29 02:00:00 +0000 UTC'>November 29, 2023</span></footer><a class=entry-link aria-label="post link to Supporting volunteer monitoring of marine biodiversity with modern web and data tools" href=https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Our Blue Machine is changing, but we are not helpless</h2></header><div class=entry-content><p>One of my many highlights from Helen Czerski’s Blue Machine.</p></div><footer class=entry-footer><span title='2023-11-28 06:40:00 +0000 UTC'>November 28, 2023</span></footer><a class=entry-link aria-label="post link to Our Blue Machine is changing, but we are not helpless" href=https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building useful machine learning tools keeps getting easier: A fish ID case study</h2></header><div class=entry-content><p>Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.</p></div><footer class=entry-footer><span title='2022-03-20 04:30:00 +0000 UTC'>March 20, 2022</span></footer><a class=entry-link aria-label="post link to Building useful machine learning tools keeps getting easier: A fish ID case study" href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Exploring and visualising Reef Life Survey data</h2></header><div class=entry-content><p>Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.</p></div><footer class=entry-footer><span title='2017-06-03 00:49:05 +0000 UTC'>June 3, 2017</span></footer><a class=entry-link aria-label="post link to Exploring and visualising Reef Life Survey data" href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The joys of offline data collection</h2></header><div class=entry-content><p>Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.</p></div><footer class=entry-footer><span title='2016-01-24 00:32:25 +0000 UTC'>January 24, 2016</span></footer><a class=entry-link aria-label="post link to The joys of offline data collection" href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/marketing/index.html b/tags/marketing/index.html
index 1b8c6b74f..cb0c14f88 100644
--- a/tags/marketing/index.html
+++ b/tags/marketing/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Marketing | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/marketing/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/marketing/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/marketing/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Marketing"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/marketing/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Marketing"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/marketing/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/marketing/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/marketing/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Marketing"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/marketing/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Marketing"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Marketing</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>LinkedIn is a teachable skill</h2></header><div class=entry-content><p>An high-level overview of things I learned from Justin Welsh’s LinkedIn Operating System course.</p></div><footer class=entry-footer><span title='2024-04-11 01:45:25 +0000 UTC'>April 11, 2024</span></footer><a class=entry-link aria-label="post link to LinkedIn is a teachable skill" href=https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Psychographic specialisations may work for discipline generalists</h2></header><div class=entry-content><p>When focusing on a market segment defined by personal beliefs, it’s often fine to position yourself as a generalist in your craft.</p></div><footer class=entry-footer><span title='2024-01-09 03:00:00 +0000 UTC'>January 9, 2024</span></footer><a class=entry-link aria-label="post link to Psychographic specialisations may work for discipline generalists" href=https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The power of parasocial relationships</h2></header><div class=entry-content><p>Repeated exposure to media personas creates relationships that help justify premium fees.</p></div><footer class=entry-footer><span title='2024-01-08 06:00:00 +0000 UTC'>January 8, 2024</span></footer><a class=entry-link aria-label="post link to The power of parasocial relationships" href=https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Artificial intelligence was a marketing term all along – just call it automation</h2></header><div class=entry-content><p>Replacing ‘artificial intelligence’ with ‘automation’ is a useful trick for cutting through the hype.</p></div><footer class=entry-footer><span title='2023-10-06 05:00:00 +0000 UTC'>October 6, 2023</span></footer><a class=entry-link aria-label="post link to Artificial intelligence was a marketing term all along – just call it automation" href=https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Revisiting Start Small, Stay Small in 2023 (Chapter 2)</h2></header><div class=entry-content><p>A summary of the second chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.</p></div><footer class=entry-footer><span title='2023-08-17 07:45:00 +0000 UTC'>August 17, 2023</span></footer><a class=entry-link aria-label="post link to Revisiting Start Small, Stay Small in 2023 (Chapter 2)" href=https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Revisiting Start Small, Stay Small in 2023 (Chapter 1)</h2></header><div class=entry-content><p>A summary of the first chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.</p></div><footer class=entry-footer><span title='2023-08-16 05:45:00 +0000 UTC'>August 16, 2023</span></footer><a class=entry-link aria-label="post link to Revisiting Start Small, Stay Small in 2023 (Chapter 1)" href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials</h2></header><div class=entry-content><p>Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.</p></div><footer class=entry-footer><span title='2022-01-14 00:05:40 +0000 UTC'>January 14, 2022</span></footer><a class=entry-link aria-label="post link to Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials" href=https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My work with Automattic</h2></header><div class=entry-content><p>Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.</p></div><footer class=entry-footer><span title='2021-10-07 00:00:00 +0000 UTC'>October 7, 2021</span></footer><a class=entry-link aria-label="post link to My work with Automattic" href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Customer lifetime value and the proliferation of misinformation on the internet</h2></header><div class=entry-content><p>There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.</p></div><footer class=entry-footer><span title='2017-01-08 20:02:30 +0000 UTC'>January 8, 2017</span></footer><a class=entry-link aria-label="post link to Customer lifetime value and the proliferation of misinformation on the internet" href=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>If you don’t pay attention, data can drive you off a cliff</h2></header><div class=entry-content><p>Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.</p></div><footer class=entry-footer><span title='2016-08-21 21:34:17 +0000 UTC'>August 21, 2016</span></footer><a class=entry-link aria-label="post link to If you don’t pay attention, data can drive you off a cliff" href=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is Data Scientist a useless job title?</h2></header><div class=entry-content><p>It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.</p></div><footer class=entry-footer><span title='2016-08-04 22:26:03 +0000 UTC'>August 4, 2016</span></footer><a class=entry-link aria-label="post link to Is Data Scientist a useless job title?" href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>This holiday season, give me real insights</h2></header><div class=entry-content><p>Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights.</p></div><footer class=entry-footer><span title='2015-12-08 06:57:25 +0000 UTC'>December 8, 2015</span></footer><a class=entry-link aria-label="post link to This holiday season, give me real insights" href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>SEO: Mostly about showing up?</h2></header><div class=entry-content><p>Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content.</p></div><footer class=entry-footer><span title='2014-12-15 04:25:25 +0000 UTC'>December 15, 2014</span></footer><a class=entry-link aria-label="post link to SEO: Mostly about showing up?" href=https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>BCRecommender Traction Update</h2></header><div class=entry-content><p>Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing.</p></div><footer class=entry-footer><span title='2014-11-05 02:29:35 +0000 UTC'>November 5, 2014</span></footer><a class=entry-link aria-label="post link to BCRecommender Traction Update" href=https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Applying the Traction Book’s Bullseye framework to BCRecommender</h2></header><div class=entry-content><p>Ranking 19 channels with the goal of getting traction for BCRecommender.</p></div><footer class=entry-footer><span title='2014-09-24 04:57:39 +0000 UTC'>September 24, 2014</span></footer><a class=entry-link aria-label="post link to Applying the Traction Book’s Bullseye framework to BCRecommender" href=https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/mongodb/index.html b/tags/mongodb/index.html
index 2128f7124..753f4875b 100644
--- a/tags/mongodb/index.html
+++ b/tags/mongodb/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>MongoDB | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/mongodb/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/mongodb/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/mongodb/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="MongoDB"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/mongodb/"><meta name=twitter:card content="summary"><meta name=twitter:title content="MongoDB"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/mongodb/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/mongodb/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/mongodb/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="MongoDB"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/mongodb/"><meta name=twitter:card content="summary"><meta name=twitter:title content="MongoDB"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>MongoDB</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Migrating a simple web application from MongoDB to Elasticsearch</h2></header><div class=entry-content><p>Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.</p></div><footer class=entry-footer><span title='2015-11-04 03:53:18 +0000 UTC'>November 4, 2015</span></footer><a class=entry-link aria-label="post link to Migrating a simple web application from MongoDB to Elasticsearch" href=https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/multi-label-classification/index.html b/tags/multi-label-classification/index.html
index ed6516407..181427fd7 100644
--- a/tags/multi-label-classification/index.html
+++ b/tags/multi-label-classification/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Multi-Label Classification | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/multi-label-classification/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/multi-label-classification/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/multi-label-classification/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Multi-Label Classification"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/multi-label-classification/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Multi-Label Classification"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/multi-label-classification/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/multi-label-classification/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/multi-label-classification/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Multi-Label Classification"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/multi-label-classification/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Multi-Label Classification"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Multi-Label Classification</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Greek Media Monitoring Kaggle competition: My approach</h2></header><div class=entry-content><p>Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.</p></div><footer class=entry-footer><span title='2014-10-07 03:21:35 +0000 UTC'>October 7, 2014</span></footer><a class=entry-link aria-label="post link to Greek Media Monitoring Kaggle competition: My approach" href=https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/music-industry/index.html b/tags/music-industry/index.html
index d0ef5a412..0102d5c2f 100644
--- a/tags/music-industry/index.html
+++ b/tags/music-industry/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Music Industry | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/music-industry/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/music-industry/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/music-industry/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Music Industry"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/music-industry/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Music Industry"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/music-industry/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/music-industry/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/music-industry/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Music Industry"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/music-industry/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Music Industry"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Music Industry</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building a Bandcamp recommender system (part 1 – motivation)</h2></header><div class=entry-content><p>My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music.</p></div><footer class=entry-footer><span title='2014-08-30 08:11:38 +0000 UTC'>August 30, 2014</span></footer><a class=entry-link aria-label="post link to Building a Bandcamp recommender system (part 1 – motivation)" href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/music/index.html b/tags/music/index.html
index 14a9889de..8f0273d68 100644
--- a/tags/music/index.html
+++ b/tags/music/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Music | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/music/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/music/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/music/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Music"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/music/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Music"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/music/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/music/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/music/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Music"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/music/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Music"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Music</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>BCRecommender Traction Update</h2></header><div class=entry-content><p>Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing.</p></div><footer class=entry-footer><span title='2014-11-05 02:29:35 +0000 UTC'>November 5, 2014</span></footer><a class=entry-link aria-label="post link to BCRecommender Traction Update" href=https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Bandcamp recommendation and discovery algorithms</h2></header><div class=entry-content><p>The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.</p></div><footer class=entry-footer><span title='2014-09-19 14:26:55 +0000 UTC'>September 19, 2014</span></footer><a class=entry-link aria-label="post link to Bandcamp recommendation and discovery algorithms" href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building a Bandcamp recommender system (part 1 – motivation)</h2></header><div class=entry-content><p>My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music.</p></div><footer class=entry-footer><span title='2014-08-30 08:11:38 +0000 UTC'>August 30, 2014</span></footer><a class=entry-link aria-label="post link to Building a Bandcamp recommender system (part 1 – motivation)" href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/nutrition/index.html b/tags/nutrition/index.html
index aaa920fab..4b6f058e2 100644
--- a/tags/nutrition/index.html
+++ b/tags/nutrition/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Nutrition | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/nutrition/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/nutrition/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/nutrition/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Nutrition"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/nutrition/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Nutrition"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/nutrition/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/nutrition/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/nutrition/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Nutrition"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/nutrition/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Nutrition"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Nutrition</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling</h2></header><div class=entry-content><p>Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.</p></div><footer class=entry-footer><span title='2015-10-19 00:02:32 +0000 UTC'>October 19, 2015</span></footer><a class=entry-link aria-label="post link to Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling" href=https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/nutritionism/index.html b/tags/nutritionism/index.html
index 8c46a3e6d..9e188e1db 100644
--- a/tags/nutritionism/index.html
+++ b/tags/nutritionism/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Nutritionism | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/nutritionism/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/nutritionism/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/nutritionism/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Nutritionism"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/nutritionism/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Nutritionism"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/nutritionism/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/nutritionism/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/nutritionism/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Nutritionism"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/nutritionism/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Nutritionism"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Nutritionism</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling</h2></header><div class=entry-content><p>Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.</p></div><footer class=entry-footer><span title='2015-10-19 00:02:32 +0000 UTC'>October 19, 2015</span></footer><a class=entry-link aria-label="post link to Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling" href=https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/orkestra/index.html b/tags/orkestra/index.html
index c505010de..82213f21f 100644
--- a/tags/orkestra/index.html
+++ b/tags/orkestra/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Orkestra | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/orkestra/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/orkestra/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/orkestra/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Orkestra"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/orkestra/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Orkestra"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/orkestra/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/orkestra/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/orkestra/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Orkestra"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/orkestra/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Orkestra"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Orkestra</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The mission matters: Moving to climate tech as a data scientist</h2></header><div class=entry-content><p>Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.</p></div><footer class=entry-footer><span title='2022-06-06 00:00:00 +0000 UTC'>June 6, 2022</span></footer><a class=entry-link aria-label="post link to The mission matters: Moving to climate tech as a data scientist" href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/parse.com/index.html b/tags/parse.com/index.html
index 1e2f9dbff..19261ef23 100644
--- a/tags/parse.com/index.html
+++ b/tags/parse.com/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Parse.com | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/parse.com/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/parse.com/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/parse.com/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Parse.com"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/parse.com/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Parse.com"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/parse.com/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/parse.com/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/parse.com/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Parse.com"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/parse.com/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Parse.com"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Parse.com</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Goodbye, Parse.com</h2></header><div class=entry-content><p>Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution.</p></div><footer class=entry-footer><span title='2015-07-31 03:29:50 +0000 UTC'>July 31, 2015</span></footer><a class=entry-link aria-label="post link to Goodbye, Parse.com" href=https://yanirseroussi.com/2015/07/31/goodbye-parse-com/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Automating Parse.com bulk data imports</h2></header><div class=entry-content><p>A script for importing data into the Parse backend-as-a-service.</p></div><footer class=entry-footer><span title='2015-01-15 04:41:16 +0000 UTC'>January 15, 2015</span></footer><a class=entry-link aria-label="post link to Automating Parse.com bulk data imports" href=https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/personal/index.html b/tags/personal/index.html
index af3b28179..90d522b88 100644
--- a/tags/personal/index.html
+++ b/tags/personal/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Personal | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/personal/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/personal/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/personal/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Personal"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/personal/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Personal"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/personal/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/personal/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/personal/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Personal"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/personal/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Personal"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Personal</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Mentorship and the art of actionable advice</h2></header><div class=entry-content><p>Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships.</p></div><footer class=entry-footer><span title='2024-04-29 06:30:00 +0000 UTC'>April 29, 2024</span></footer><a class=entry-link aria-label="post link to Mentorship and the art of actionable advice" href=https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My experience as a Data Tech Lead with Work on Climate</h2></header><div class=entry-content><p>The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.</p></div><footer class=entry-footer><span title='2024-04-08 02:00:00 +0000 UTC'>April 8, 2024</span></footer><a class=entry-link aria-label="post link to My experience as a Data Tech Lead with Work on Climate" href=https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Atomic Habits is full of actionable advice</h2></header><div class=entry-content><p>I put the book to use after the first listen, and will definitely revisit it in the future to form better habits.</p></div><footer class=entry-footer><span title='2024-03-12 06:19:31 +0000 UTC'>March 12, 2024</span></footer><a class=entry-link aria-label="post link to Atomic Habits is full of actionable advice" href=https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The three Cs of indie consulting: Confidence, Cash, and Connections</h2></header><div class=entry-content><p>Jonathan Stark makes a compelling argument why you should have the three Cs before quitting your job to go solo consulting.</p></div><footer class=entry-footer><span title='2024-02-17 02:00:00 +0000 UTC'>February 17, 2024</span></footer><a class=entry-link aria-label="post link to The three Cs of indie consulting: Confidence, Cash, and Connections" href=https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>New decade, new tagline: Data & AI for Impact</h2></header><div class=entry-content><p>Shifting focus to ‘Data & AI for Impact’, with more startup-related content, increased posting frequency, and deeper audience engagement.</p></div><footer class=entry-footer><span title='2024-01-19 00:00:00 +0000 UTC'>January 19, 2024</span></footer><a class=entry-link aria-label="post link to New decade, new tagline: Data & AI for Impact" href=https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Psychographic specialisations may work for discipline generalists</h2></header><div class=entry-content><p>When focusing on a market segment defined by personal beliefs, it’s often fine to position yourself as a generalist in your craft.</p></div><footer class=entry-footer><span title='2024-01-09 03:00:00 +0000 UTC'>January 9, 2024</span></footer><a class=entry-link aria-label="post link to Psychographic specialisations may work for discipline generalists" href=https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The lines between solo consulting and product building are blurry</h2></header><div class=entry-content><p>It turns out that problems like finding a niche and defining the ideal clients are key to any solo business.</p></div><footer class=entry-footer><span title='2023-09-25 00:00:00 +0000 UTC'>September 25, 2023</span></footer><a class=entry-link aria-label="post link to The lines between solo consulting and product building are blurry" href=https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My rediscovery of quiet writing on the open web</h2></header><div class=entry-content><p>Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes.</p></div><footer class=entry-footer><span title='2023-08-28 05:30:00 +0000 UTC'>August 28, 2023</span></footer><a class=entry-link aria-label="post link to My rediscovery of quiet writing on the open web" href=https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The Minimalist Entrepreneur is too prescriptive for me</h2></header><div class=entry-content><p>While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder’s experience.</p></div><footer class=entry-footer><span title='2023-08-21 03:15:00 +0000 UTC'>August 21, 2023</span></footer><a class=entry-link aria-label="post link to The Minimalist Entrepreneur is too prescriptive for me" href=https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Revisiting Start Small, Stay Small in 2023 (Chapter 2)</h2></header><div class=entry-content><p>A summary of the second chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.</p></div><footer class=entry-footer><span title='2023-08-17 07:45:00 +0000 UTC'>August 17, 2023</span></footer><a class=entry-link aria-label="post link to Revisiting Start Small, Stay Small in 2023 (Chapter 2)" href=https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Revisiting Start Small, Stay Small in 2023 (Chapter 1)</h2></header><div class=entry-content><p>A summary of the first chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.</p></div><footer class=entry-footer><span title='2023-08-16 05:45:00 +0000 UTC'>August 16, 2023</span></footer><a class=entry-link aria-label="post link to Revisiting Start Small, Stay Small in 2023 (Chapter 1)" href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>You can't save time</h2></header><div class=entry-content><p>Time can be spent doing different activities, but it can’t be stored and saved for later.</p></div><footer class=entry-footer><span title='2023-07-11 00:00:00 +0000 UTC'>July 11, 2023</span></footer><a class=entry-link aria-label="post link to You can't save time" href=https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The mission matters: Moving to climate tech as a data scientist</h2></header><div class=entry-content><p>Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.</p></div><footer class=entry-footer><span title='2022-06-06 00:00:00 +0000 UTC'>June 6, 2022</span></footer><a class=entry-link aria-label="post link to The mission matters: Moving to climate tech as a data scientist" href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My 10-step path to becoming a remote data scientist with Automattic</h2></header><div class=entry-content><p>I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.</p></div><footer class=entry-footer><span title='2017-07-29 05:39:26 +0000 UTC'>July 29, 2017</span></footer><a class=entry-link aria-label="post link to My 10-step path to becoming a remote data scientist with Automattic" href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Ask Why! Finding motives, causes, and purpose in data science</h2></header><div class=entry-content><p>Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling.</p></div><footer class=entry-footer><span title='2016-09-19 21:28:44 +0000 UTC'>September 19, 2016</span></footer><a class=entry-link aria-label="post link to Ask Why! Finding motives, causes, and purpose in data science" href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The joys of offline data collection</h2></header><div class=entry-content><p>Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.</p></div><footer class=entry-footer><span title='2016-01-24 00:32:25 +0000 UTC'>January 24, 2016</span></footer><a class=entry-link aria-label="post link to The joys of offline data collection" href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The long road to a lifestyle business</h2></header><div class=entry-content><p>Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects.</p></div><footer class=entry-footer><span title='2015-03-22 09:43:47 +0000 UTC'>March 22, 2015</span></footer><a class=entry-link aria-label="post link to The long road to a lifestyle business" href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/phantomjs/index.html b/tags/phantomjs/index.html
index 3196b2a2f..e51540a96 100644
--- a/tags/phantomjs/index.html
+++ b/tags/phantomjs/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>PhantomJS | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/phantomjs/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/phantomjs/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/phantomjs/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="PhantomJS"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/phantomjs/"><meta name=twitter:card content="summary"><meta name=twitter:title content="PhantomJS"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/phantomjs/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/phantomjs/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/phantomjs/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="PhantomJS"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/phantomjs/"><meta name=twitter:card content="summary"><meta name=twitter:title content="PhantomJS"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>PhantomJS</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Automating Parse.com bulk data imports</h2></header><div class=entry-content><p>A script for importing data into the Parse backend-as-a-service.</p></div><footer class=entry-footer><span title='2015-01-15 04:41:16 +0000 UTC'>January 15, 2015</span></footer><a class=entry-link aria-label="post link to Automating Parse.com bulk data imports" href=https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/politics/index.html b/tags/politics/index.html
index 2a894144d..8730b09e3 100644
--- a/tags/politics/index.html
+++ b/tags/politics/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Politics | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/politics/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/politics/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/politics/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Politics"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/politics/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Politics"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/politics/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/politics/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/politics/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Politics"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/politics/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Politics"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Politics</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The mission matters: Moving to climate tech as a data scientist</h2></header><div class=entry-content><p>Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.</p></div><footer class=entry-footer><span title='2022-06-06 00:00:00 +0000 UTC'>June 6, 2022</span></footer><a class=entry-link aria-label="post link to The mission matters: Moving to climate tech as a data scientist" href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Customer lifetime value and the proliferation of misinformation on the internet</h2></header><div class=entry-content><p>There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.</p></div><footer class=entry-footer><span title='2017-01-08 20:02:30 +0000 UTC'>January 8, 2017</span></footer><a class=entry-link aria-label="post link to Customer lifetime value and the proliferation of misinformation on the internet" href=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/predictive-modelling/index.html b/tags/predictive-modelling/index.html
index 8517a6baa..7f50ef422 100644
--- a/tags/predictive-modelling/index.html
+++ b/tags/predictive-modelling/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Predictive Modelling | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/predictive-modelling/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/predictive-modelling/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/predictive-modelling/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Predictive Modelling"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/predictive-modelling/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Predictive Modelling"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/predictive-modelling/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/predictive-modelling/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/predictive-modelling/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Predictive Modelling"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/predictive-modelling/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Predictive Modelling"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Predictive Modelling</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Customer lifetime value and the proliferation of misinformation on the internet</h2></header><div class=entry-content><p>There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.</p></div><footer class=entry-footer><span title='2017-01-08 20:02:30 +0000 UTC'>January 8, 2017</span></footer><a class=entry-link aria-label="post link to Customer lifetime value and the proliferation of misinformation on the internet" href=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions</h2></header><div class=entry-content><p>Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time.</p></div><footer class=entry-footer><span title='2016-05-14 19:57:03 +0000 UTC'>May 14, 2016</span></footer><a class=entry-link aria-label="post link to Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions" href=https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Why you should stop worrying about deep learning and deepen your understanding of causality instead</h2></header><div class=entry-content><p>Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.</p></div><footer class=entry-footer><span title='2016-02-14 11:04:11 +0000 UTC'>February 14, 2016</span></footer><a class=entry-link aria-label="post link to Why you should stop worrying about deep learning and deepen your understanding of causality instead" href=https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The joys of offline data collection</h2></header><div class=entry-content><p>Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.</p></div><footer class=entry-footer><span title='2016-01-24 00:32:25 +0000 UTC'>January 24, 2016</span></footer><a class=entry-link aria-label="post link to The joys of offline data collection" href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The hardest parts of data science</h2></header><div class=entry-content><p>Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.</p></div><footer class=entry-footer><span title='2015-11-23 04:14:21 +0000 UTC'>November 23, 2015</span></footer><a class=entry-link aria-label="post link to The hardest parts of data science" href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling</h2></header><div class=entry-content><p>Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.</p></div><footer class=entry-footer><span title='2015-10-19 00:02:32 +0000 UTC'>October 19, 2015</span></footer><a class=entry-link aria-label="post link to Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling" href=https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The wonderful world of recommender systems</h2></header><div class=entry-content><p>Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.</p></div><footer class=entry-footer><span title='2015-10-02 05:25:57 +0000 UTC'>October 2, 2015</span></footer><a class=entry-link aria-label="post link to The wonderful world of recommender systems" href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Learning about deep learning through album cover classification</h2></header><div class=entry-content><p>Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.</p></div><footer class=entry-footer><span title='2015-07-06 22:21:42 +0000 UTC'>July 6, 2015</span></footer><a class=entry-link aria-label="post link to Learning about deep learning through album cover classification" href=https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Hopping on the deep learning bandwagon</h2></header><div class=entry-content><p>To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.</p></div><footer class=entry-footer><span title='2015-06-06 05:00:22 +0000 UTC'>June 6, 2015</span></footer><a class=entry-link aria-label="post link to Hopping on the deep learning bandwagon" href=https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>First steps in data science: author-aware sentiment analysis</h2></header><div class=entry-content><p>I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.</p></div><footer class=entry-footer><span title='2015-05-02 08:31:10 +0000 UTC'>May 2, 2015</span></footer><a class=entry-link aria-label="post link to First steps in data science: author-aware sentiment analysis" href=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My PhD work</h2></header><div class=entry-content><p>An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.</p></div><footer class=entry-footer><span title='2015-03-30 03:23:33 +0000 UTC'>March 30, 2015</span></footer><a class=entry-link aria-label="post link to My PhD work" href=https://yanirseroussi.com/phd-work/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)</h2></header><div class=entry-content><p>My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).</p></div><footer class=entry-footer><span title='2015-02-11 06:34:17 +0000 UTC'>February 11, 2015</span></footer><a class=entry-link aria-label="post link to Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)" href=https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)</h2></header><div class=entry-content><p>Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).</p></div><footer class=entry-footer><span title='2015-01-29 10:37:39 +0000 UTC'>January 29, 2015</span></footer><a class=entry-link aria-label="post link to Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)" href=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Stochastic Gradient Boosting: Choosing the Best Number of Iterations</h2></header><div class=entry-content><p>Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.</p></div><footer class=entry-footer><span title='2014-12-29 02:30:06 +0000 UTC'>December 29, 2014</span></footer><a class=entry-link aria-label="post link to Stochastic Gradient Boosting: Choosing the Best Number of Iterations" href=https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)</h2></header><div class=entry-content><p>Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.</p></div><footer class=entry-footer><span title='2014-11-19 09:17:34 +0000 UTC'>November 19, 2014</span></footer><a class=entry-link aria-label="post link to Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)" href=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Greek Media Monitoring Kaggle competition: My approach</h2></header><div class=entry-content><p>Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.</p></div><footer class=entry-footer><span title='2014-10-07 03:21:35 +0000 UTC'>October 7, 2014</span></footer><a class=entry-link aria-label="post link to Greek Media Monitoring Kaggle competition: My approach" href=https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Bandcamp recommendation and discovery algorithms</h2></header><div class=entry-content><p>The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.</p></div><footer class=entry-footer><span title='2014-09-19 14:26:55 +0000 UTC'>September 19, 2014</span></footer><a class=entry-link aria-label="post link to Bandcamp recommendation and discovery algorithms" href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>How to (almost) win Kaggle competitions</h2></header><div class=entry-content><p>Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.</p></div><footer class=entry-footer><span title='2014-08-24 12:40:53 +0000 UTC'>August 24, 2014</span></footer><a class=entry-link aria-label="post link to How to (almost) win Kaggle competitions" href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Kaggle competition tips and summaries</h2></header><div class=entry-content><p>Pointers to all my Kaggle advice posts and competition summaries.</p></div><footer class=entry-footer><span title='2014-04-05 23:46:10 +0000 UTC'>April 5, 2014</span></footer><a class=entry-link aria-label="post link to Kaggle competition tips and summaries" href=https://yanirseroussi.com/kaggle/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/price-forecasting/index.html b/tags/price-forecasting/index.html
index 2198a6adf..9f6b20ff0 100644
--- a/tags/price-forecasting/index.html
+++ b/tags/price-forecasting/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Price Forecasting | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/price-forecasting/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/price-forecasting/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/price-forecasting/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Price Forecasting"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/price-forecasting/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Price Forecasting"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/price-forecasting/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/price-forecasting/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/price-forecasting/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Price Forecasting"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/price-forecasting/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Price Forecasting"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Price Forecasting</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)</h2></header><div class=entry-content><p>Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.</p></div><footer class=entry-footer><span title='2014-11-19 09:17:34 +0000 UTC'>November 19, 2014</span></footer><a class=entry-link aria-label="post link to Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)" href=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/productivity/index.html b/tags/productivity/index.html
index ae7ad2d2c..2caeb961b 100644
--- a/tags/productivity/index.html
+++ b/tags/productivity/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Productivity | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/productivity/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/productivity/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/productivity/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Productivity"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/productivity/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Productivity"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/productivity/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/productivity/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/productivity/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Productivity"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/productivity/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Productivity"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Productivity</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Atomic Habits is full of actionable advice</h2></header><div class=entry-content><p>I put the book to use after the first listen, and will definitely revisit it in the future to form better habits.</p></div><footer class=entry-footer><span title='2024-03-12 06:19:31 +0000 UTC'>March 12, 2024</span></footer><a class=entry-link aria-label="post link to Atomic Habits is full of actionable advice" href=https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Revisiting Start Small, Stay Small in 2023 (Chapter 2)</h2></header><div class=entry-content><p>A summary of the second chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.</p></div><footer class=entry-footer><span title='2023-08-17 07:45:00 +0000 UTC'>August 17, 2023</span></footer><a class=entry-link aria-label="post link to Revisiting Start Small, Stay Small in 2023 (Chapter 2)" href=https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Revisiting Start Small, Stay Small in 2023 (Chapter 1)</h2></header><div class=entry-content><p>A summary of the first chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.</p></div><footer class=entry-footer><span title='2023-08-16 05:45:00 +0000 UTC'>August 16, 2023</span></footer><a class=entry-link aria-label="post link to Revisiting Start Small, Stay Small in 2023 (Chapter 1)" href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Email notifications on public GitHub commits</h2></header><div class=entry-content><p>GitHub publishes an Atom feed, which means you can use any RSS reader to follow commits.</p></div><footer class=entry-footer><span title='2023-08-14 05:15:00 +0000 UTC'>August 14, 2023</span></footer><a class=entry-link aria-label="post link to Email notifications on public GitHub commits" href=https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/quotes/index.html b/tags/quotes/index.html
index 23a9890d5..852919967 100644
--- a/tags/quotes/index.html
+++ b/tags/quotes/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Quotes | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/quotes/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/quotes/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/quotes/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Quotes"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/quotes/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Quotes"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/quotes/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/quotes/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/quotes/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Quotes"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/quotes/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Quotes"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Quotes</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Five team-building mistakes, according to Patty McCord</h2></header><div class=entry-content><p>Takeaways from an interview with Patty McCord on The Startup Podcast.</p></div><footer class=entry-footer><span title='2024-06-26 00:00:00 +0000 UTC'>June 26, 2024</span></footer><a class=entry-link aria-label="post link to Five team-building mistakes, according to Patty McCord" href=https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Dealing with endless data changes</h2></header><div class=entry-content><p>Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data.</p></div><footer class=entry-footer><span title='2024-06-22 22:50:00 +0000 UTC'>June 22, 2024</span></footer><a class=entry-link aria-label="post link to Dealing with endless data changes" href=https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The rules of the passion economy</h2></header><div class=entry-content><p>Summary of the main messages from the book The Passion Economy by Adam Davidson.</p></div><footer class=entry-footer><span title='2024-06-12 02:50:00 +0000 UTC'>June 12, 2024</span></footer><a class=entry-link aria-label="post link to The rules of the passion economy" href=https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Adapting to the economy of algorithms</h2></header><div class=entry-content><p>Overview of the book The Economy of Algorithms by Marek Kowalkiewicz.</p></div><footer class=entry-footer><span title='2024-05-25 00:00:00 +0000 UTC'>May 25, 2024</span></footer><a class=entry-link aria-label="post link to Adapting to the economy of algorithms" href=https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The data engineering lifecycle is not going anywhere</h2></header><div class=entry-content><p>My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley.</p></div><footer class=entry-footer><span title='2024-04-05 01:00:00 +0000 UTC'>April 5, 2024</span></footer><a class=entry-link aria-label="post link to The data engineering lifecycle is not going anywhere" href=https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Future software development may require fewer humans</h2></header><div class=entry-content><p>Reflecting on an interview with Jason Warner, CEO of poolside.</p></div><footer class=entry-footer><span title='2024-02-06 06:15:00 +0000 UTC'>February 6, 2024</span></footer><a class=entry-link aria-label="post link to Future software development may require fewer humans" href=https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Transfer learning applies to energy market bidding</h2></header><div class=entry-content><p>An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland.</p></div><footer class=entry-footer><span title='2023-12-14 00:15:00 +0000 UTC'>December 14, 2023</span></footer><a class=entry-link aria-label="post link to Transfer learning applies to energy market bidding" href=https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Our Blue Machine is changing, but we are not helpless</h2></header><div class=entry-content><p>One of my many highlights from Helen Czerski’s Blue Machine.</p></div><footer class=entry-footer><span title='2023-11-28 06:40:00 +0000 UTC'>November 28, 2023</span></footer><a class=entry-link aria-label="post link to Our Blue Machine is changing, but we are not helpless" href=https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Artificial intelligence was a marketing term all along – just call it automation</h2></header><div class=entry-content><p>Replacing ‘artificial intelligence’ with ‘automation’ is a useful trick for cutting through the hype.</p></div><footer class=entry-footer><span title='2023-10-06 05:00:00 +0000 UTC'>October 6, 2023</span></footer><a class=entry-link aria-label="post link to Artificial intelligence was a marketing term all along – just call it automation" href=https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Revisiting Start Small, Stay Small in 2023 (Chapter 2)</h2></header><div class=entry-content><p>A summary of the second chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.</p></div><footer class=entry-footer><span title='2023-08-17 07:45:00 +0000 UTC'>August 17, 2023</span></footer><a class=entry-link aria-label="post link to Revisiting Start Small, Stay Small in 2023 (Chapter 2)" href=https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Revisiting Start Small, Stay Small in 2023 (Chapter 1)</h2></header><div class=entry-content><p>A summary of the first chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.</p></div><footer class=entry-footer><span title='2023-08-16 05:45:00 +0000 UTC'>August 16, 2023</span></footer><a class=entry-link aria-label="post link to Revisiting Start Small, Stay Small in 2023 (Chapter 1)" href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>You can't save time</h2></header><div class=entry-content><p>Time can be spent doing different activities, but it can’t be stored and saved for later.</p></div><footer class=entry-footer><span title='2023-07-11 00:00:00 +0000 UTC'>July 11, 2023</span></footer><a class=entry-link aria-label="post link to You can't save time" href=https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/recommender-systems/index.html b/tags/recommender-systems/index.html
index 0db9b447a..4cabcffc4 100644
--- a/tags/recommender-systems/index.html
+++ b/tags/recommender-systems/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Recommender Systems | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/recommender-systems/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/recommender-systems/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/recommender-systems/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Recommender Systems"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/recommender-systems/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Recommender Systems"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/recommender-systems/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/recommender-systems/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/recommender-systems/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Recommender Systems"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/recommender-systems/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Recommender Systems"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Recommender Systems</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The wonderful world of recommender systems</h2></header><div class=entry-content><p>Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.</p></div><footer class=entry-footer><span title='2015-10-02 05:25:57 +0000 UTC'>October 2, 2015</span></footer><a class=entry-link aria-label="post link to The wonderful world of recommender systems" href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Applying the Traction Book’s Bullseye framework to BCRecommender</h2></header><div class=entry-content><p>Ranking 19 channels with the goal of getting traction for BCRecommender.</p></div><footer class=entry-footer><span title='2014-09-24 04:57:39 +0000 UTC'>September 24, 2014</span></footer><a class=entry-link aria-label="post link to Applying the Traction Book’s Bullseye framework to BCRecommender" href=https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Bandcamp recommendation and discovery algorithms</h2></header><div class=entry-content><p>The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.</p></div><footer class=entry-footer><span title='2014-09-19 14:26:55 +0000 UTC'>September 19, 2014</span></footer><a class=entry-link aria-label="post link to Bandcamp recommendation and discovery algorithms" href=https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)</h2></header><div class=entry-content><p>Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service.</p></div><footer class=entry-footer><span title='2014-09-07 10:48:44 +0000 UTC'>September 7, 2014</span></footer><a class=entry-link aria-label="post link to Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)" href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building a Bandcamp recommender system (part 1 – motivation)</h2></header><div class=entry-content><p>My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music.</p></div><footer class=entry-footer><span title='2014-08-30 08:11:38 +0000 UTC'>August 30, 2014</span></footer><a class=entry-link aria-label="post link to Building a Bandcamp recommender system (part 1 – motivation)" href=https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/reef-life-survey/index.html b/tags/reef-life-survey/index.html
index e58d55c23..1004a073a 100644
--- a/tags/reef-life-survey/index.html
+++ b/tags/reef-life-survey/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Reef Life Survey | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/reef-life-survey/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/reef-life-survey/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/reef-life-survey/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Reef Life Survey"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/reef-life-survey/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Reef Life Survey"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/reef-life-survey/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/reef-life-survey/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/reef-life-survey/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Reef Life Survey"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/reef-life-survey/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Reef Life Survey"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Reef Life Survey</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Artificial intelligence, automation, and the art of counting fish</h2></header><div class=entry-content><p>Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement.</p></div><footer class=entry-footer><span title='2024-04-01 06:00:00 +0000 UTC'>April 1, 2024</span></footer><a class=entry-link aria-label="post link to Artificial intelligence, automation, and the art of counting fish" href=https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Supporting volunteer monitoring of marine biodiversity with modern web and data tools</h2></header><div class=entry-content><p>Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.</p></div><footer class=entry-footer><span title='2023-11-29 02:00:00 +0000 UTC'>November 29, 2023</span></footer><a class=entry-link aria-label="post link to Supporting volunteer monitoring of marine biodiversity with modern web and data tools" href=https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>You don't need a proprietary API for static maps</h2></header><div class=entry-content><p>For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps.</p></div><footer class=entry-footer><span title='2023-11-21 06:00:00 +0000 UTC'>November 21, 2023</span></footer><a class=entry-link aria-label="post link to You don't need a proprietary API for static maps" href=https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Email notifications on public GitHub commits</h2></header><div class=entry-content><p>GitHub publishes an Atom feed, which means you can use any RSS reader to follow commits.</p></div><footer class=entry-footer><span title='2023-08-14 05:15:00 +0000 UTC'>August 14, 2023</span></footer><a class=entry-link aria-label="post link to Email notifications on public GitHub commits" href=https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building useful machine learning tools keeps getting easier: A fish ID case study</h2></header><div class=entry-content><p>Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.</p></div><footer class=entry-footer><span title='2022-03-20 04:30:00 +0000 UTC'>March 20, 2022</span></footer><a class=entry-link aria-label="post link to Building useful machine learning tools keeps getting easier: A fish ID case study" href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Some highlights from 2020</h2></header><div class=entry-content><p>Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform.</p></div><footer class=entry-footer><span title='2021-04-05 06:41:48 +0000 UTC'>April 5, 2021</span></footer><a class=entry-link aria-label="post link to Some highlights from 2020" href=https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Exploring and visualising Reef Life Survey data</h2></header><div class=entry-content><p>Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.</p></div><footer class=entry-footer><span title='2017-06-03 00:49:05 +0000 UTC'>June 3, 2017</span></footer><a class=entry-link aria-label="post link to Exploring and visualising Reef Life Survey data" href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The joys of offline data collection</h2></header><div class=entry-content><p>Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.</p></div><footer class=entry-footer><span title='2016-01-24 00:32:25 +0000 UTC'>January 24, 2016</span></footer><a class=entry-link aria-label="post link to The joys of offline data collection" href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/remote-work/index.html b/tags/remote-work/index.html
index 815859270..d9793cc7a 100644
--- a/tags/remote-work/index.html
+++ b/tags/remote-work/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Remote Work | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/remote-work/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/remote-work/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/remote-work/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Remote Work"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/remote-work/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Remote Work"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/remote-work/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/remote-work/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/remote-work/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Remote Work"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/remote-work/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Remote Work"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Remote Work</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My experience as a Data Tech Lead with Work on Climate</h2></header><div class=entry-content><p>The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.</p></div><footer class=entry-footer><span title='2024-04-08 02:00:00 +0000 UTC'>April 8, 2024</span></footer><a class=entry-link aria-label="post link to My experience as a Data Tech Lead with Work on Climate" href=https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The mission matters: Moving to climate tech as a data scientist</h2></header><div class=entry-content><p>Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.</p></div><footer class=entry-footer><span title='2022-06-06 00:00:00 +0000 UTC'>June 6, 2022</span></footer><a class=entry-link aria-label="post link to The mission matters: Moving to climate tech as a data scientist" href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My work with Automattic</h2></header><div class=entry-content><p>Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.</p></div><footer class=entry-footer><span title='2021-10-07 00:00:00 +0000 UTC'>October 7, 2021</span></footer><a class=entry-link aria-label="post link to My work with Automattic" href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Some highlights from 2020</h2></header><div class=entry-content><p>Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform.</p></div><footer class=entry-footer><span title='2021-04-05 06:41:48 +0000 UTC'>April 5, 2021</span></footer><a class=entry-link aria-label="post link to Some highlights from 2020" href=https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>A day in the life of a remote data scientist</h2></header><div class=entry-content><p>Video of a talk I gave on remote data science work at the Data Science Sydney meetup.</p></div><footer class=entry-footer><span title='2019-12-11 22:06:19 +0000 UTC'>December 11, 2019</span></footer><a class=entry-link aria-label="post link to A day in the life of a remote data scientist" href=https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Reflections on remote data science work</h2></header><div class=entry-content><p>Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist.</p></div><footer class=entry-footer><span title='2018-11-03 06:33:13 +0000 UTC'>November 3, 2018</span></footer><a class=entry-link aria-label="post link to Reflections on remote data science work" href=https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/science-communication/index.html b/tags/science-communication/index.html
index 506501da9..9e7989593 100644
--- a/tags/science-communication/index.html
+++ b/tags/science-communication/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Science Communication | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/science-communication/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/science-communication/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/science-communication/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Science Communication"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/science-communication/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Science Communication"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/science-communication/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/science-communication/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/science-communication/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Science Communication"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/science-communication/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Science Communication"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Science Communication</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Customer lifetime value and the proliferation of misinformation on the internet</h2></header><div class=entry-content><p>There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.</p></div><footer class=entry-footer><span title='2017-01-08 20:02:30 +0000 UTC'>January 8, 2017</span></footer><a class=entry-link aria-label="post link to Customer lifetime value and the proliferation of misinformation on the internet" href=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The hardest parts of data science</h2></header><div class=entry-content><p>Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.</p></div><footer class=entry-footer><span title='2015-11-23 04:14:21 +0000 UTC'>November 23, 2015</span></footer><a class=entry-link aria-label="post link to The hardest parts of data science" href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/scikit-learn/index.html b/tags/scikit-learn/index.html
index b8b3113b4..3344feb77 100644
--- a/tags/scikit-learn/index.html
+++ b/tags/scikit-learn/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Scikit-Learn | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/scikit-learn/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/scikit-learn/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/scikit-learn/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Scikit-Learn"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/scikit-learn/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Scikit-Learn"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/scikit-learn/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/scikit-learn/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/scikit-learn/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Scikit-Learn"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/scikit-learn/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Scikit-Learn"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Scikit-Learn</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Stochastic Gradient Boosting: Choosing the Best Number of Iterations</h2></header><div class=entry-content><p>Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.</p></div><footer class=entry-footer><span title='2014-12-29 02:30:06 +0000 UTC'>December 29, 2014</span></footer><a class=entry-link aria-label="post link to Stochastic Gradient Boosting: Choosing the Best Number of Iterations" href=https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)</h2></header><div class=entry-content><p>Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.</p></div><footer class=entry-footer><span title='2014-11-19 09:17:34 +0000 UTC'>November 19, 2014</span></footer><a class=entry-link aria-label="post link to Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)" href=https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/scuba-diving/index.html b/tags/scuba-diving/index.html
index d77bd6856..b14c06bcc 100644
--- a/tags/scuba-diving/index.html
+++ b/tags/scuba-diving/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Scuba Diving | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/scuba-diving/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/scuba-diving/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/scuba-diving/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Scuba Diving"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/scuba-diving/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Scuba Diving"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/scuba-diving/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/scuba-diving/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/scuba-diving/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Scuba Diving"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/scuba-diving/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Scuba Diving"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Scuba Diving</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The joys of offline data collection</h2></header><div class=entry-content><p>Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.</p></div><footer class=entry-footer><span title='2016-01-24 00:32:25 +0000 UTC'>January 24, 2016</span></footer><a class=entry-link aria-label="post link to The joys of offline data collection" href=https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/search-engine-optimisation/index.html b/tags/search-engine-optimisation/index.html
index fd33c6081..d1c4fd6de 100644
--- a/tags/search-engine-optimisation/index.html
+++ b/tags/search-engine-optimisation/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Search Engine Optimisation | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/search-engine-optimisation/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/search-engine-optimisation/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/search-engine-optimisation/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Search Engine Optimisation"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/search-engine-optimisation/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Search Engine Optimisation"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/search-engine-optimisation/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/search-engine-optimisation/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/search-engine-optimisation/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Search Engine Optimisation"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/search-engine-optimisation/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Search Engine Optimisation"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Search Engine Optimisation</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Customer lifetime value and the proliferation of misinformation on the internet</h2></header><div class=entry-content><p>There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.</p></div><footer class=entry-footer><span title='2017-01-08 20:02:30 +0000 UTC'>January 8, 2017</span></footer><a class=entry-link aria-label="post link to Customer lifetime value and the proliferation of misinformation on the internet" href=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)</h2></header><div class=entry-content><p>My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).</p></div><footer class=entry-footer><span title='2015-02-11 06:34:17 +0000 UTC'>February 11, 2015</span></footer><a class=entry-link aria-label="post link to Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)" href=https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)</h2></header><div class=entry-content><p>Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).</p></div><footer class=entry-footer><span title='2015-01-29 10:37:39 +0000 UTC'>January 29, 2015</span></footer><a class=entry-link aria-label="post link to Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)" href=https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>SEO: Mostly about showing up?</h2></header><div class=entry-content><p>Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content.</p></div><footer class=entry-footer><span title='2014-12-15 04:25:25 +0000 UTC'>December 15, 2014</span></footer><a class=entry-link aria-label="post link to SEO: Mostly about showing up?" href=https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/security/index.html b/tags/security/index.html
index becf7a377..05fd74ee2 100644
--- a/tags/security/index.html
+++ b/tags/security/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Security | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/security/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/security/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/security/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Security"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/security/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Security"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/security/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/security/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/security/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Security"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/security/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Security"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Security</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Using YubiKey for SSH access</h2></header><div class=entry-content><p>Some pointers for setting up SSH access with YubiKey on Ubuntu 22.04.</p></div><footer class=entry-footer><span title='2023-07-23 00:07:15 +0000 UTC'>July 23, 2023</span></footer><a class=entry-link aria-label="post link to Using YubiKey for SSH access" href=https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/sentiment-analysis/index.html b/tags/sentiment-analysis/index.html
index 32ed563ac..0e7c535e1 100644
--- a/tags/sentiment-analysis/index.html
+++ b/tags/sentiment-analysis/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Sentiment Analysis | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/sentiment-analysis/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/sentiment-analysis/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/sentiment-analysis/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Sentiment Analysis"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/sentiment-analysis/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Sentiment Analysis"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/sentiment-analysis/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/sentiment-analysis/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/sentiment-analysis/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Sentiment Analysis"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/sentiment-analysis/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Sentiment Analysis"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Sentiment Analysis</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>First steps in data science: author-aware sentiment analysis</h2></header><div class=entry-content><p>I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.</p></div><footer class=entry-footer><span title='2015-05-02 08:31:10 +0000 UTC'>May 2, 2015</span></footer><a class=entry-link aria-label="post link to First steps in data science: author-aware sentiment analysis" href=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/software-engineering/index.html b/tags/software-engineering/index.html
index 7df62f683..290a7d27e 100644
--- a/tags/software-engineering/index.html
+++ b/tags/software-engineering/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Software Engineering | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/software-engineering/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/software-engineering/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/software-engineering/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Software Engineering"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/software-engineering/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Software Engineering"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/software-engineering/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/software-engineering/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/software-engineering/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Software Engineering"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/software-engineering/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Software Engineering"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Software Engineering</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is your tech stack ready for data-intensive applications?</h2></header><div class=entry-content><p>Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.</p></div><footer class=entry-footer><span title='2024-06-24 02:00:00 +0000 UTC'>June 24, 2024</span></footer><a class=entry-link aria-label="post link to Is your tech stack ready for data-intensive applications?" href=https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Dealing with endless data changes</h2></header><div class=entry-content><p>Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data.</p></div><footer class=entry-footer><span title='2024-06-22 22:50:00 +0000 UTC'>June 22, 2024</span></footer><a class=entry-link aria-label="post link to Dealing with endless data changes" href=https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>How to avoid startups with poor development processes</h2></header><div class=entry-content><p>Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.</p></div><footer class=entry-footer><span title='2024-06-03 02:45:00 +0000 UTC'>June 3, 2024</span></footer><a class=entry-link aria-label="post link to How to avoid startups with poor development processes" href=https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Assessing a startup's data-to-AI health</h2></header><div class=entry-content><p>Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.</p></div><footer class=entry-footer><span title='2024-04-22 06:00:00 +0000 UTC'>April 22, 2024</span></footer><a class=entry-link aria-label="post link to Assessing a startup's data-to-AI health" href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>AI does not obviate the need for testing and observability</h2></header><div class=entry-content><p>It’s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software.</p></div><footer class=entry-footer><span title='2024-04-15 05:00:00 +0000 UTC'>April 15, 2024</span></footer><a class=entry-link aria-label="post link to AI does not obviate the need for testing and observability" href=https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Questions to consider when using AI for PDF data extraction</h2></header><div class=entry-content><p>Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.</p></div><footer class=entry-footer><span title='2024-03-11 00:00:00 +0000 UTC'>March 11, 2024</span></footer><a class=entry-link aria-label="post link to Questions to consider when using AI for PDF data extraction" href=https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Avoiding AI complexity: First, write no code</h2></header><div class=entry-content><p>Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.</p></div><footer class=entry-footer><span title='2024-02-26 01:45:00 +0000 UTC'>February 26, 2024</span></footer><a class=entry-link aria-label="post link to Avoiding AI complexity: First, write no code" href=https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Nudging ChatGPT to invent books you have no time to read</h2></header><div class=entry-content><p>Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities.</p></div><footer class=entry-footer><span title='2024-02-12 05:00:00 +0000 UTC'>February 12, 2024</span></footer><a class=entry-link aria-label="post link to Nudging ChatGPT to invent books you have no time to read" href=https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Future software development may require fewer humans</h2></header><div class=entry-content><p>Reflecting on an interview with Jason Warner, CEO of poolside.</p></div><footer class=entry-footer><span title='2024-02-06 06:15:00 +0000 UTC'>February 6, 2024</span></footer><a class=entry-link aria-label="post link to Future software development may require fewer humans" href=https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Supporting volunteer monitoring of marine biodiversity with modern web and data tools</h2></header><div class=entry-content><p>Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.</p></div><footer class=entry-footer><span title='2023-11-29 02:00:00 +0000 UTC'>November 29, 2023</span></footer><a class=entry-link aria-label="post link to Supporting volunteer monitoring of marine biodiversity with modern web and data tools" href=https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>You don't need a proprietary API for static maps</h2></header><div class=entry-content><p>For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps.</p></div><footer class=entry-footer><span title='2023-11-21 06:00:00 +0000 UTC'>November 21, 2023</span></footer><a class=entry-link aria-label="post link to You don't need a proprietary API for static maps" href=https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Lessons from reluctant data engineering</h2></header><div class=entry-content><p>Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had.</p></div><footer class=entry-footer><span title='2023-10-25 04:45:00 +0000 UTC'>October 25, 2023</span></footer><a class=entry-link aria-label="post link to Lessons from reluctant data engineering" href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Google's Rules of Machine Learning still apply in the age of large language models</h2></header><div class=entry-content><p>Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices.</p></div><footer class=entry-footer><span title='2023-09-21 21:30:00 +0000 UTC'>September 21, 2023</span></footer><a class=entry-link aria-label="post link to Google's Rules of Machine Learning still apply in the age of large language models" href=https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Was data science a failure mode of software engineering?</h2></header><div class=entry-content><p>Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.</p></div><footer class=entry-footer><span title='2023-06-30 00:06:30 +0000 UTC'>June 30, 2023</span></footer><a class=entry-link aria-label="post link to Was data science a failure mode of software engineering?" href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>How hackable are automated coding assessments?</h2></header><div class=entry-content><p>Exploring the hackability of speed-based coding tests, using CodeSignal’s Industry Coding Framework as a case study.</p></div><footer class=entry-footer><span title='2023-05-26 00:03:00 +0000 UTC'>May 26, 2023</span></footer><a class=entry-link aria-label="post link to How hackable are automated coding assessments?" href=https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building useful machine learning tools keeps getting easier: A fish ID case study</h2></header><div class=entry-content><p>Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.</p></div><footer class=entry-footer><span title='2022-03-20 04:30:00 +0000 UTC'>March 20, 2022</span></footer><a class=entry-link aria-label="post link to Building useful machine learning tools keeps getting easier: A fish ID case study" href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My work with Automattic</h2></header><div class=entry-content><p>Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.</p></div><footer class=entry-footer><span title='2021-10-07 00:00:00 +0000 UTC'>October 7, 2021</span></footer><a class=entry-link aria-label="post link to My work with Automattic" href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Software commodities are eating interesting data science work</h2></header><div class=entry-content><p>Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?</p></div><footer class=entry-footer><span title='2020-01-11 09:22:35 +0000 UTC'>January 11, 2020</span></footer><a class=entry-link aria-label="post link to Software commodities are eating interesting data science work" href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Bootstrapping the right way?</h2></header><div class=entry-content><p>Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.</p></div><footer class=entry-footer><span title='2019-10-06 06:48:07 +0000 UTC'>October 6, 2019</span></footer><a class=entry-link aria-label="post link to Bootstrapping the right way?" href=https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Hackers beware: Bootstrap sampling may be harmful</h2></header><div class=entry-content><p>Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren’t that simple.</p></div><footer class=entry-footer><span title='2019-01-07 21:07:56 +0000 UTC'>January 7, 2019</span></footer><a class=entry-link aria-label="post link to Hackers beware: Bootstrap sampling may be harmful" href=https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Exploring and visualising Reef Life Survey data</h2></header><div class=entry-content><p>Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.</p></div><footer class=entry-footer><span title='2017-06-03 00:49:05 +0000 UTC'>June 3, 2017</span></footer><a class=entry-link aria-label="post link to Exploring and visualising Reef Life Survey data" href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is Data Scientist a useless job title?</h2></header><div class=entry-content><p>It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.</p></div><footer class=entry-footer><span title='2016-08-04 22:26:03 +0000 UTC'>August 4, 2016</span></footer><a class=entry-link aria-label="post link to Is Data Scientist a useless job title?" href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Migrating a simple web application from MongoDB to Elasticsearch</h2></header><div class=entry-content><p>Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.</p></div><footer class=entry-footer><span title='2015-11-04 03:53:18 +0000 UTC'>November 4, 2015</span></footer><a class=entry-link aria-label="post link to Migrating a simple web application from MongoDB to Elasticsearch" href=https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The wonderful world of recommender systems</h2></header><div class=entry-content><p>Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.</p></div><footer class=entry-footer><span title='2015-10-02 05:25:57 +0000 UTC'>October 2, 2015</span></footer><a class=entry-link aria-label="post link to The wonderful world of recommender systems" href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Goodbye, Parse.com</h2></header><div class=entry-content><p>Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution.</p></div><footer class=entry-footer><span title='2015-07-31 03:29:50 +0000 UTC'>July 31, 2015</span></footer><a class=entry-link aria-label="post link to Goodbye, Parse.com" href=https://yanirseroussi.com/2015/07/31/goodbye-parse-com/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>First steps in data science: author-aware sentiment analysis</h2></header><div class=entry-content><p>I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.</p></div><footer class=entry-footer><span title='2015-05-02 08:31:10 +0000 UTC'>May 2, 2015</span></footer><a class=entry-link aria-label="post link to First steps in data science: author-aware sentiment analysis" href=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Automating Parse.com bulk data imports</h2></header><div class=entry-content><p>A script for importing data into the Parse backend-as-a-service.</p></div><footer class=entry-footer><span title='2015-01-15 04:41:16 +0000 UTC'>January 15, 2015</span></footer><a class=entry-link aria-label="post link to Automating Parse.com bulk data imports" href=https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>What is data science?</h2></header><div class=entry-content><p>Data science has been a hot term in the past few years. Still, there isn’t a single definition of the field. This post discusses my favourite definition.</p></div><footer class=entry-footer><span title='2014-10-23 03:22:08 +0000 UTC'>October 23, 2014</span></footer><a class=entry-link aria-label="post link to What is data science?" href=https://yanirseroussi.com/2014/10/23/what-is-data-science/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)</h2></header><div class=entry-content><p>Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service.</p></div><footer class=entry-footer><span title='2014-09-07 10:48:44 +0000 UTC'>September 7, 2014</span></footer><a class=entry-link aria-label="post link to Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)" href=https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/split-testing/index.html b/tags/split-testing/index.html
index 558a75fa8..9714fb07f 100644
--- a/tags/split-testing/index.html
+++ b/tags/split-testing/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Split Testing | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/split-testing/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/split-testing/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/split-testing/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Split Testing"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/split-testing/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Split Testing"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/split-testing/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/split-testing/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/split-testing/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Split Testing"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/split-testing/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Split Testing"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Split Testing</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials</h2></header><div class=entry-content><p>Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.</p></div><footer class=entry-footer><span title='2022-01-14 00:05:40 +0000 UTC'>January 14, 2022</span></footer><a class=entry-link aria-label="post link to Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials" href=https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Some highlights from 2020</h2></header><div class=entry-content><p>Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform.</p></div><footer class=entry-footer><span title='2021-04-05 06:41:48 +0000 UTC'>April 5, 2021</span></footer><a class=entry-link aria-label="post link to Some highlights from 2020" href=https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Making Bayesian A/B testing more accessible</h2></header><div class=entry-content><p>A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.</p></div><footer class=entry-footer><span title='2016-06-19 10:32:15 +0000 UTC'>June 19, 2016</span></footer><a class=entry-link aria-label="post link to Making Bayesian A/B testing more accessible" href=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/startups/index.html b/tags/startups/index.html
index 664ed3d63..7b19477d0 100644
--- a/tags/startups/index.html
+++ b/tags/startups/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Startups | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/startups/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/startups/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/startups/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Startups"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/startups/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Startups"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/startups/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/startups/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/startups/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Startups"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/startups/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Startups"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Startups</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Five team-building mistakes, according to Patty McCord</h2></header><div class=entry-content><p>Takeaways from an interview with Patty McCord on The Startup Podcast.</p></div><footer class=entry-footer><span title='2024-06-26 00:00:00 +0000 UTC'>June 26, 2024</span></footer><a class=entry-link aria-label="post link to Five team-building mistakes, according to Patty McCord" href=https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is your tech stack ready for data-intensive applications?</h2></header><div class=entry-content><p>Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.</p></div><footer class=entry-footer><span title='2024-06-24 02:00:00 +0000 UTC'>June 24, 2024</span></footer><a class=entry-link aria-label="post link to Is your tech stack ready for data-intensive applications?" href=https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>AI ain't gonna save you from bad data</h2></header><div class=entry-content><p>Since we’re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects.</p></div><footer class=entry-footer><span title='2024-06-17 02:00:00 +0000 UTC'>June 17, 2024</span></footer><a class=entry-link aria-label="post link to AI ain't gonna save you from bad data" href=https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Startup data health starts with healthy event tracking</h2></header><div class=entry-content><p>Expanding on the startup health check question of tracking Kukuyeva’s five business aspects as wide events.</p></div><footer class=entry-footer><span title='2024-06-10 04:00:00 +0000 UTC'>June 10, 2024</span></footer><a class=entry-link aria-label="post link to Startup data health starts with healthy event tracking" href=https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>How to avoid startups with poor development processes</h2></header><div class=entry-content><p>Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.</p></div><footer class=entry-footer><span title='2024-06-03 02:45:00 +0000 UTC'>June 3, 2024</span></footer><a class=entry-link aria-label="post link to How to avoid startups with poor development processes" href=https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Plumbing, Decisions, and Automation: De-hyping Data & AI</h2></header><div class=entry-content><p>Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype).</p></div><footer class=entry-footer><span title='2024-05-27 02:00:00 +0000 UTC'>May 27, 2024</span></footer><a class=entry-link aria-label="post link to Plumbing, Decisions, and Automation: De-hyping Data & AI" href=https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Question startup culture before accepting a data-to-AI role</h2></header><div class=entry-content><p>Eight questions that prospective data-to-AI employees should ask about a startup’s work and data culture.</p></div><footer class=entry-footer><span title='2024-05-20 02:25:00 +0000 UTC'>May 20, 2024</span></footer><a class=entry-link aria-label="post link to Question startup culture before accepting a data-to-AI role" href=https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Probing the People aspects of an early-stage startup</h2></header><div class=entry-content><p>Ten questions that prospective employees should ask about a startup’s team, especially for data-centric roles.</p></div><footer class=entry-footer><span title='2024-05-13 02:00:00 +0000 UTC'>May 13, 2024</span></footer><a class=entry-link aria-label="post link to Probing the People aspects of an early-stage startup" href=https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Business questions to ask before taking a startup data role</h2></header><div class=entry-content><p>Fourteen questions that prospective employees should ask about a startup’s business model and product, especially for data-focused roles.</p></div><footer class=entry-footer><span title='2024-05-06 04:30:00 +0000 UTC'>May 6, 2024</span></footer><a class=entry-link aria-label="post link to Business questions to ask before taking a startup data role" href=https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Mentorship and the art of actionable advice</h2></header><div class=entry-content><p>Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships.</p></div><footer class=entry-footer><span title='2024-04-29 06:30:00 +0000 UTC'>April 29, 2024</span></footer><a class=entry-link aria-label="post link to Mentorship and the art of actionable advice" href=https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Assessing a startup's data-to-AI health</h2></header><div class=entry-content><p>Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.</p></div><footer class=entry-footer><span title='2024-04-22 06:00:00 +0000 UTC'>April 22, 2024</span></footer><a class=entry-link aria-label="post link to Assessing a startup's data-to-AI health" href=https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My experience as a Data Tech Lead with Work on Climate</h2></header><div class=entry-content><p>The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.</p></div><footer class=entry-footer><span title='2024-04-08 02:00:00 +0000 UTC'>April 8, 2024</span></footer><a class=entry-link aria-label="post link to My experience as a Data Tech Lead with Work on Climate" href=https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Two types of startup data problems</h2></header><div class=entry-content><p>Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they’re likely to face.</p></div><footer class=entry-footer><span title='2024-03-04 02:00:00 +0000 UTC'>March 4, 2024</span></footer><a class=entry-link aria-label="post link to Two types of startup data problems" href=https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Avoiding AI complexity: First, write no code</h2></header><div class=entry-content><p>Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.</p></div><footer class=entry-footer><span title='2024-02-26 01:45:00 +0000 UTC'>February 26, 2024</span></footer><a class=entry-link aria-label="post link to Avoiding AI complexity: First, write no code" href=https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building your startup's minimum viable data stack</h2></header><div class=entry-content><p>First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations.</p></div><footer class=entry-footer><span title='2024-02-19 00:00:00 +0000 UTC'>February 19, 2024</span></footer><a class=entry-link aria-label="post link to Building your startup's minimum viable data stack" href=https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Substance over titles: Your first data hire may be a data scientist</h2></header><div class=entry-content><p>Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people.</p></div><footer class=entry-footer><span title='2024-02-05 02:45:00 +0000 UTC'>February 5, 2024</span></footer><a class=entry-link aria-label="post link to Substance over titles: Your first data hire may be a data scientist" href=https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/statistics/index.html b/tags/statistics/index.html
index 476452fcb..b8cf4dfe5 100644
--- a/tags/statistics/index.html
+++ b/tags/statistics/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Statistics | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/statistics/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/statistics/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/statistics/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Statistics"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/statistics/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Statistics"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/statistics/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/statistics/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/statistics/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Statistics"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/statistics/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Statistics"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Statistics</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials</h2></header><div class=entry-content><p>Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.</p></div><footer class=entry-footer><span title='2022-01-14 00:05:40 +0000 UTC'>January 14, 2022</span></footer><a class=entry-link aria-label="post link to Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials" href=https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Many is not enough: Counting simulations to bootstrap the right way</h2></header><div class=entry-content><p>Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals.</p></div><footer class=entry-footer><span title='2020-08-24 01:35:17 +0000 UTC'>August 24, 2020</span></footer><a class=entry-link aria-label="post link to Many is not enough: Counting simulations to bootstrap the right way" href=https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Bootstrapping the right way?</h2></header><div class=entry-content><p>Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.</p></div><footer class=entry-footer><span title='2019-10-06 06:48:07 +0000 UTC'>October 6, 2019</span></footer><a class=entry-link aria-label="post link to Bootstrapping the right way?" href=https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Hackers beware: Bootstrap sampling may be harmful</h2></header><div class=entry-content><p>Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren’t that simple.</p></div><footer class=entry-footer><span title='2019-01-07 21:07:56 +0000 UTC'>January 7, 2019</span></footer><a class=entry-link aria-label="post link to Hackers beware: Bootstrap sampling may be harmful" href=https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The most practical causal inference book I’ve read (is still a draft)</h2></header><div class=entry-content><p>Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area.</p></div><footer class=entry-footer><span title='2018-12-24 02:37:50 +0000 UTC'>December 24, 2018</span></footer><a class=entry-link aria-label="post link to The most practical causal inference book I’ve read (is still a draft)" href=https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Defining data science in 2018</h2></header><div class=entry-content><p>Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.</p></div><footer class=entry-footer><span title='2018-07-22 08:27:43 +0000 UTC'>July 22, 2018</span></footer><a class=entry-link aria-label="post link to Defining data science in 2018" href=https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Customer lifetime value and the proliferation of misinformation on the internet</h2></header><div class=entry-content><p>There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.</p></div><footer class=entry-footer><span title='2017-01-08 20:02:30 +0000 UTC'>January 8, 2017</span></footer><a class=entry-link aria-label="post link to Customer lifetime value and the proliferation of misinformation on the internet" href=https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>If you don’t pay attention, data can drive you off a cliff</h2></header><div class=entry-content><p>Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.</p></div><footer class=entry-footer><span title='2016-08-21 21:34:17 +0000 UTC'>August 21, 2016</span></footer><a class=entry-link aria-label="post link to If you don’t pay attention, data can drive you off a cliff" href=https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Is Data Scientist a useless job title?</h2></header><div class=entry-content><p>It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.</p></div><footer class=entry-footer><span title='2016-08-04 22:26:03 +0000 UTC'>August 4, 2016</span></footer><a class=entry-link aria-label="post link to Is Data Scientist a useless job title?" href=https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Making Bayesian A/B testing more accessible</h2></header><div class=entry-content><p>A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.</p></div><footer class=entry-footer><span title='2016-06-19 10:32:15 +0000 UTC'>June 19, 2016</span></footer><a class=entry-link aria-label="post link to Making Bayesian A/B testing more accessible" href=https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/sustainability/index.html b/tags/sustainability/index.html
index 10c2c9aeb..3c3e289c8 100644
--- a/tags/sustainability/index.html
+++ b/tags/sustainability/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Sustainability | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/sustainability/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/sustainability/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/sustainability/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Sustainability"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/sustainability/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Sustainability"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/sustainability/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/sustainability/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/sustainability/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Sustainability"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/sustainability/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Sustainability"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Sustainability</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>The mission matters: Moving to climate tech as a data scientist</h2></header><div class=entry-content><p>Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.</p></div><footer class=entry-footer><span title='2022-06-06 00:00:00 +0000 UTC'>June 6, 2022</span></footer><a class=entry-link aria-label="post link to The mission matters: Moving to climate tech as a data scientist" href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Migrating from WordPress.com to Hugo on GitHub + Cloudflare</h2></header><div class=entry-content><p>My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.</p></div><footer class=entry-footer><span title='2021-11-10 06:30:00 +0000 UTC'>November 10, 2021</span></footer><a class=entry-link aria-label="post link to Migrating from WordPress.com to Hugo on GitHub + Cloudflare" href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Some highlights from 2020</h2></header><div class=entry-content><p>Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform.</p></div><footer class=entry-footer><span title='2021-04-05 06:41:48 +0000 UTC'>April 5, 2021</span></footer><a class=entry-link aria-label="post link to Some highlights from 2020" href=https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/traction-book/index.html b/tags/traction-book/index.html
index 867d782ca..c86245669 100644
--- a/tags/traction-book/index.html
+++ b/tags/traction-book/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Traction Book | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/traction-book/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/traction-book/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/traction-book/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Traction Book"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/traction-book/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Traction Book"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/traction-book/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/traction-book/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/traction-book/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Traction Book"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/traction-book/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Traction Book"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Traction Book</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>SEO: Mostly about showing up?</h2></header><div class=entry-content><p>Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content.</p></div><footer class=entry-footer><span title='2014-12-15 04:25:25 +0000 UTC'>December 15, 2014</span></footer><a class=entry-link aria-label="post link to SEO: Mostly about showing up?" href=https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>BCRecommender Traction Update</h2></header><div class=entry-content><p>Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing.</p></div><footer class=entry-footer><span title='2014-11-05 02:29:35 +0000 UTC'>November 5, 2014</span></footer><a class=entry-link aria-label="post link to BCRecommender Traction Update" href=https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Applying the Traction Book’s Bullseye framework to BCRecommender</h2></header><div class=entry-content><p>Ranking 19 channels with the goal of getting traction for BCRecommender.</p></div><footer class=entry-footer><span title='2014-09-24 04:57:39 +0000 UTC'>September 24, 2014</span></footer><a class=entry-link aria-label="post link to Applying the Traction Book’s Bullseye framework to BCRecommender" href=https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/web-development/index.html b/tags/web-development/index.html
index f5222b235..598bb933b 100644
--- a/tags/web-development/index.html
+++ b/tags/web-development/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Web Development | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/web-development/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/web-development/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/web-development/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Web Development"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/web-development/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Web Development"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/web-development/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/web-development/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/web-development/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Web Development"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/web-development/"><meta name=twitter:card content="summary"><meta name=twitter:title content="Web Development"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>Web Development</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Supporting volunteer monitoring of marine biodiversity with modern web and data tools</h2></header><div class=entry-content><p>Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.</p></div><footer class=entry-footer><span title='2023-11-29 02:00:00 +0000 UTC'>November 29, 2023</span></footer><a class=entry-link aria-label="post link to Supporting volunteer monitoring of marine biodiversity with modern web and data tools" href=https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>You don't need a proprietary API for static maps</h2></header><div class=entry-content><p>For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps.</p></div><footer class=entry-footer><span title='2023-11-21 06:00:00 +0000 UTC'>November 21, 2023</span></footer><a class=entry-link aria-label="post link to You don't need a proprietary API for static maps" href=https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Making a TIL section with Hugo and PaperMod</h2></header><div class=entry-content><p>How I added a Today I Learned section to my Hugo site with the PaperMod theme.</p></div><footer class=entry-footer><span title='2023-07-17 00:06:15 +0000 UTC'>July 17, 2023</span></footer><a class=entry-link aria-label="post link to Making a TIL section with Hugo and PaperMod" href=https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Building useful machine learning tools keeps getting easier: A fish ID case study</h2></header><div class=entry-content><p>Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.</p></div><footer class=entry-footer><span title='2022-03-20 04:30:00 +0000 UTC'>March 20, 2022</span></footer><a class=entry-link aria-label="post link to Building useful machine learning tools keeps getting easier: A fish ID case study" href=https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Migrating from WordPress.com to Hugo on GitHub + Cloudflare</h2></header><div class=entry-content><p>My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.</p></div><footer class=entry-footer><span title='2021-11-10 06:30:00 +0000 UTC'>November 10, 2021</span></footer><a class=entry-link aria-label="post link to Migrating from WordPress.com to Hugo on GitHub + Cloudflare" href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Exploring and visualising Reef Life Survey data</h2></header><div class=entry-content><p>Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.</p></div><footer class=entry-footer><span title='2017-06-03 00:49:05 +0000 UTC'>June 3, 2017</span></footer><a class=entry-link aria-label="post link to Exploring and visualising Reef Life Survey data" href=https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/tags/wordpress/index.html b/tags/wordpress/index.html
index f70f7413b..8b20b3df8 100644
--- a/tags/wordpress/index.html
+++ b/tags/wordpress/index.html
@@ -1,7 +1,7 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>WordPress | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/wordpress/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/wordpress/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/wordpress/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="WordPress"><meta property="og:description" content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
-"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/wordpress/"><meta name=twitter:card content="summary"><meta name=twitter:title content="WordPress"><meta name=twitter:description content="Using advancements in data and artificial intelligence tech to make a positive impact on our world.
+<meta name=keywords content><meta name=description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/tags/wordpress/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/tags/wordpress/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/tags/wordpress/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="WordPress"><meta property="og:description" content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
+"><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/tags/wordpress/"><meta name=twitter:card content="summary"><meta name=twitter:title content="WordPress"><meta name=twitter:description content="Helping climate & nature tech startups ship data-intensive solutions (artificial intelligence, machine learning, data science, and advanced analytics).
 "></head><body class=list id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><header class=page-header><h1>WordPress</h1></header><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Migrating from WordPress.com to Hugo on GitHub + Cloudflare</h2></header><div class=entry-content><p>My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.</p></div><footer class=entry-footer><span title='2021-11-10 06:30:00 +0000 UTC'>November 10, 2021</span></footer><a class=entry-link aria-label="post link to Migrating from WordPress.com to Hugo on GitHub + Cloudflare" href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>Reflections on remote data science work</h2></header><div class=entry-content><p>Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist.</p></div><footer class=entry-footer><span title='2018-11-03 06:33:13 +0000 UTC'>November 3, 2018</span></footer><a class=entry-link aria-label="post link to Reflections on remote data science work" href=https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>My 10-step path to becoming a remote data scientist with Automattic</h2></header><div class=entry-content><p>I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.</p></div><footer class=entry-footer><span title='2017-07-29 05:39:26 +0000 UTC'>July 29, 2017</span></footer><a class=entry-link aria-label="post link to My 10-step path to becoming a remote data scientist with Automattic" href=https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/></a></article><article class="post-entry tag-entry"><header class=entry-header><h2 class=entry-hint-parent>This holiday season, give me real insights</h2></header><div class=entry-content><p>Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights.</p></div><footer class=entry-footer><span title='2015-12-08 06:57:25 +0000 UTC'>December 8, 2015</span></footer><a class=entry-link aria-label="post link to This holiday season, give me real insights" href=https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/></a></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
diff --git a/talks/index.html b/talks/index.html
index 51a15a28e..27c4f27f5 100644
--- a/talks/index.html
+++ b/talks/index.html
@@ -1,8 +1,5 @@
-<!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Talks | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content><meta name=description content="Just a list of some talks I&rsquo;ve given, saved here for future reference and for general public benefit.
-Lessons from reluctant data engineering (presented at DataEngBytes Brisbane 2023; see video and post) Data ethics – beyond curve fitting (given as part of a local fast.ai course in June 2021; see video and post) Moving Automattic to net zero carbon emissions (PublishPress interview from November 2020) Running remote data teams (Data Futurology webinar from June 2020) Bootstrapping the right way (presented at YOW!"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/talks/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/talks/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Talks"><meta property="og:description" content="Just a list of some talks I&rsquo;ve given, saved here for future reference and for general public benefit.
-Lessons from reluctant data engineering (presented at DataEngBytes Brisbane 2023; see video and post) Data ethics – beyond curve fitting (given as part of a local fast.ai course in June 2021; see video and post) Moving Automattic to net zero carbon emissions (PublishPress interview from November 2020) Running remote data teams (Data Futurology webinar from June 2020) Bootstrapping the right way (presented at YOW!"><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/talks/"><meta property="og:image" content="https://yanirseroussi.com/talks/fractional-chief-data-officer/assets/yanir-seroussi-dataengbytes-bne-2023.webp"><meta property="article:section" content><meta property="article:modified_time" content="2024-05-06T16:35:22+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/talks/fractional-chief-data-officer/assets/yanir-seroussi-dataengbytes-bne-2023.webp"><meta name=twitter:title content="Talks"><meta name=twitter:description content="Just a list of some talks I&rsquo;ve given, saved here for future reference and for general public benefit.
-Lessons from reluctant data engineering (presented at DataEngBytes Brisbane 2023; see video and post) Data ethics – beyond curve fitting (given as part of a local fast.ai course in June 2021; see video and post) Moving Automattic to net zero carbon emissions (PublishPress interview from November 2020) Running remote data teams (Data Futurology webinar from June 2020) Bootstrapping the right way (presented at YOW!"><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Talks","item":"https://yanirseroussi.com/talks/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Talks","name":"Talks","description":"Just a list of some talks I\u0026rsquo;ve given, saved here for future reference and for general public benefit.\nLessons from reluctant data engineering (presented at DataEngBytes Brisbane 2023; see video and post) Data ethics – beyond curve fitting (given as part of a local fast.ai course in June 2021; see video and post) Moving Automattic to net zero carbon emissions (PublishPress interview from November 2020) Running remote data teams (Data Futurology webinar from June 2020) Bootstrapping the right way (presented at YOW!","keywords":[],"articleBody":"Just a list of some talks I’ve given, saved here for future reference and for general public benefit.\nLessons from reluctant data engineering (presented at DataEngBytes Brisbane 2023; see video and post) Data ethics – beyond curve fitting (given as part of a local fast.ai course in June 2021; see video and post) Moving Automattic to net zero carbon emissions (PublishPress interview from November 2020) Running remote data teams (Data Futurology webinar from June 2020) Bootstrapping the right way (presented at YOW! Data 2019; also available as a video) A day in the life of a remote data scientist (presented at Data Science Sydney meetup 2019; also available as a video) Ask Why! Finding motives, causes, and purpose in data science (presented at MeDaScIn 2016 and at Data Science Sydney meetup 2016; also available as a blog post and as a video) The hardest parts of data science (presented at Sydney Data Science Breakfast meetup 2015; also available as a blog post) The wonderful world of recommender systems (presented at Data Science Sydney meetup 2015; also available as a blog post) Gensim: Topic Modelling for Humans (an overview of the gensim package, presented at the Sydney Python meetup 2015) Demystifying data: An introduction to data science (presented as a General Assembly Workshop) How to (almost) win Kaggle competitions (presented at Data Science Sydney meetup 2014; also available as a blog post) High-level introduction to recommender systems (a much-shorter version of The wonderful world of recommender systems) How to avoid most sharding issues with MongoDB (presented at MongoDB Sydney Meetup 2013) Some issues we encountered with Mongo 2.2 (MongoDB Conference 2012 Lightning Talk) Authorship attribution with author-aware topic models (presented as a poster at ACL 2012) Personalised rating prediction for new users using latent factor models (presented at Hypertext 2011) ","wordCount":"299","inLanguage":"en","image":"https://yanirseroussi.com/talks/fractional-chief-data-officer/assets/yanir-seroussi-dataengbytes-bne-2023.webp","datePublished":"0001-01-01T00:00:00Z","dateModified":"2024-05-06T16:35:22+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/talks/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span class=active>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Talks</h1><div class=post-meta></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/fractional-chief-data-officer/assets/yanir-seroussi-dataengbytes-bne-2023.webp alt="Yanir Seroussi giving a talk at DataEngBytes Brisbane 2023."></figure><div class=post-content><p>Just a list of some talks I&rsquo;ve given, saved here for future reference and for general public benefit.</p><ul><li><a href=https://docs.google.com/presentation/d/100GiDkp3UKfQtWtxZOF4CaJWTuSYtkEYxkI0_INdqq8/edit target=_blank rel=noopener>Lessons from reluctant data engineering</a> (presented at <a href=https://dataengconf.com.au/ target=_blank rel=noopener>DataEngBytes</a> Brisbane 2023; see <a href="https://www.youtube.com/watch?v=NE6e7Xx7OLQ" target=_blank rel=noopener>video</a> and <a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>post</a>)</li><li><a href=https://docs.google.com/presentation/d/1vi0YKxmevanE8zA6u2ZuA835boSXKMa-Su8LZmLA7EA/edit target=_blank rel=noopener>Data ethics – beyond curve fitting</a> (given as part of a local fast.ai course in June 2021; see <a href="https://www.youtube.com/watch?v=P1ebqJ4ZIEI" target=_blank rel=noopener>video</a> and <a href=https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/>post</a>)</li><li><a href="https://www.youtube.com/watch?v=tMFr_agPLJY" target=_blank rel=noopener>Moving Automattic to net zero carbon emissions</a> (PublishPress interview from November 2020)</li><li><a href="https://www.youtube.com/watch?v=79LfP8Kqgvw" target=_blank rel=noopener>Running remote data teams</a> (Data Futurology webinar from June 2020)</li><li><a href=http://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/>Bootstrapping the right way</a> (presented at <a href=https://yowconference.com/data/2019/ target=_blank rel="noopener noreferrer">YOW! Data 2019</a>; also available as <a href="https://www.youtube.com/watch?v=9SwvIqEQXP0" target=_blank rel="noopener noreferrer">a video</a>)</li><li><a href=https://yanirs.github.io/talks/remote-data-scientist/ target=_blank rel="noopener noreferrer">A day in the life of a remote data scientist</a> (presented at <a href=https://www.meetup.com/Data-Science-Sydney/ target=_blank rel="noopener noreferrer">Data Science Sydney</a> meetup 2019; also available as <a href="https://www.youtube.com/watch?v=5qbVEEtgWcY" target=_blank rel="noopener noreferrer">a video</a>)</li><li><a href=https://yanirs.github.io/talks/ask-why/ target=_blank rel="noopener noreferrer">Ask Why! Finding motives, causes, and purpose in data science</a> (presented at <a href=http://www.datasciencemelbourne.com/medascin2016/ target=_blank rel="noopener noreferrer">MeDaScIn 2016</a> and at <a href=http://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>Data Science Sydney</a> meetup 2016; also available as <a href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/>a blog post</a> and as <a href="http://www.youtube.com/watch?v=2wqu-drqlpo" target=_blank rel="noopener noreferrer">a video</a>)</li><li><a href=http://yanirs.github.io/talks/the-hardest-part-of-data-science/ target=_blank rel="noopener noreferrer">The hardest parts of data science</a> (presented at <a href=http://www.meetup.com/The-Sydney-Data-Science-Breakfast-Meetup-Group/ target=_blank rel="noopener noreferrer">Sydney Data Science Breakfast</a> meetup 2015; also <a href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/>available as a blog post</a>)</li><li><a href=http://yanirs.github.io/talks/the-wonderful-world-of-recommender-systems/ target=_blank rel="noopener noreferrer">The wonderful world of recommender systems</a> (presented at <a href=http://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>Data Science Sydney</a> meetup 2015; also <a href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/>available as a blog post</a>)</li><li><a href=http://yanirs.github.io/talks/gensim-overview/ target=_blank rel="noopener noreferrer">Gensim: Topic Modelling for Humans</a> (an overview of the <a href=http://radimrehurek.com/gensim/ target=_blank rel="noopener noreferrer">gensim</a> package, presented at the <a href=http://www.meetup.com/sydneypython/ target=_blank rel="noopener noreferrer">Sydney Python</a> meetup 2015)</li><li><a href=http://yanirs.github.io/talks/general-assembly-intro-to-data-science/ target=_blank rel=noopener>Demystifying data: An introduction to data science</a> (presented as a <a href=https://generalassemb.ly/education/demystifying-data-an-introduction-to-data-science target=_blank rel=noopener>General Assembly Workshop</a>)</li><li><a href=http://yanirs.github.io/talks/data-science-sydney-winning-kaggle/ target=_blank rel=noopener>How to (almost) win Kaggle competitions</a> (presented at <a href=http://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>Data Science Sydney</a> meetup 2014; also <a href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/>available as a blog post</a>)</li><li><a href=http://yanirs.github.io/talks/high-level-recommender-systems-intro/ target=_blank rel=noopener>High-level introduction to recommender systems</a> (a much-shorter version of <a href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/>The wonderful world of recommender systems</a>)</li><li><a href=http://prezi.com/aqk4kstbvg9v/how-to-avoid-most-sharding-issues/ target=_blank rel=noopener>How to avoid most sharding issues with MongoDB</a> (presented at MongoDB Sydney Meetup 2013)</li><li><a href=http://yanirs.github.io/talks/mongo2012.pdf target=_blank rel=noopener>Some issues we encountered with Mongo 2.2</a> (MongoDB Conference 2012 Lightning Talk)</li><li><a href=http://yanirs.github.io/talks/acl2012-poster.pdf target=_blank rel=noopener>Authorship attribution with author-aware topic models</a> (presented as a poster at <a href=http://acl2012.org/ target=_blank rel=noopener>ACL 2012</a>)</li><li><a href=http://yanirs.github.io/talks/ht2011-talk.pdf target=_blank rel=noopener>Personalised rating prediction for new users using latent factor models</a> (presented at <a href=http://www.ht2011.org/ target=_blank rel=noopener>Hypertext 2011</a>)</li></ul></div><footer class=post-footer><ul class=post-tags></ul></footer></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
+<!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Past talks by Yanir: Startup Data & AI Consultant | Yanir Seroussi | Data & AI for Startup Impact</title>
+<meta name=keywords content><meta name=description content="Yanir Seroussi&rsquo;s talks on data science, artificial intelligence, machine learning, and career journey."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/talks/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/talks/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Past talks by Yanir: Startup Data & AI Consultant"><meta property="og:description" content="Yanir Seroussi&rsquo;s talks on data science, artificial intelligence, machine learning, and career journey."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/talks/"><meta property="og:image" content="https://yanirseroussi.com/talks/fractional-chief-data-officer/assets/yanir-seroussi-dataengbytes-bne-2023.webp"><meta property="article:section" content><meta property="article:modified_time" content="2024-06-27T09:32:12+10:00"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://yanirseroussi.com/talks/fractional-chief-data-officer/assets/yanir-seroussi-dataengbytes-bne-2023.webp"><meta name=twitter:title content="Past talks by Yanir: Startup Data & AI Consultant"><meta name=twitter:description content="Yanir Seroussi&rsquo;s talks on data science, artificial intelligence, machine learning, and career journey."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Past talks by Yanir: Startup Data \u0026 AI Consultant","item":"https://yanirseroussi.com/talks/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Past talks by Yanir: Startup Data \u0026 AI Consultant","name":"Past talks by Yanir: Startup Data \u0026 AI Consultant","description":"Yanir Seroussi\u0026rsquo;s talks on data science, artificial intelligence, machine learning, and career journey.","keywords":[],"articleBody":"Talks I’ve given over the years, saved here for future reference and for public benefit.\nLessons from reluctant data engineering (presented at DataEngBytes Brisbane 2023; see video and post) Data ethics – beyond curve fitting (given as part of a local fast.ai course in June 2021; see video and post) Moving Automattic to net zero carbon emissions (PublishPress interview from November 2020) Running remote data teams (Data Futurology webinar from June 2020) Bootstrapping the right way (presented at YOW! Data 2019; also available as a video) A day in the life of a remote data scientist (presented at Data Science Sydney meetup 2019; also available as a video) Ask Why! Finding motives, causes, and purpose in data science (presented at MeDaScIn 2016 and at Data Science Sydney meetup 2016; also available as a blog post and as a video) The hardest parts of data science (presented at Sydney Data Science Breakfast meetup 2015; also available as a blog post) The wonderful world of recommender systems (presented at Data Science Sydney meetup 2015; also available as a blog post) Gensim: Topic Modelling for Humans (an overview of the gensim package, presented at the Sydney Python meetup 2015) Demystifying data: An introduction to data science (presented as a General Assembly Workshop) How to (almost) win Kaggle competitions (presented at Data Science Sydney meetup 2014; also available as a blog post) High-level introduction to recommender systems (a much-shorter version of The wonderful world of recommender systems) How to avoid most sharding issues with MongoDB (presented at MongoDB Sydney Meetup 2013) Some issues we encountered with Mongo 2.2 (MongoDB Conference 2012 Lightning Talk) Authorship attribution with author-aware topic models (presented as a poster at ACL 2012) Personalised rating prediction for new users using latent factor models (presented at Hypertext 2011) ","wordCount":"296","inLanguage":"en","image":"https://yanirseroussi.com/talks/fractional-chief-data-officer/assets/yanir-seroussi-dataengbytes-bne-2023.webp","datePublished":"0001-01-01T00:00:00Z","dateModified":"2024-06-27T09:32:12+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/talks/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span class=active>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><h1 class="post-title entry-hint-parent">Past talks by Yanir: Startup Data & AI Consultant</h1><div class=post-meta></div></header><figure class=entry-cover><img loading=eager src=https://yanirseroussi.com/fractional-chief-data-officer/assets/yanir-seroussi-dataengbytes-bne-2023.webp alt="Yanir Seroussi giving a talk at DataEngBytes Brisbane 2023."></figure><div class=post-content><p>Talks I&rsquo;ve given over the years, saved here for future reference and for public benefit.</p><ul><li><a href=https://docs.google.com/presentation/d/100GiDkp3UKfQtWtxZOF4CaJWTuSYtkEYxkI0_INdqq8/edit target=_blank rel=noopener>Lessons from reluctant data engineering</a> (presented at <a href=https://dataengconf.com.au/ target=_blank rel=noopener>DataEngBytes</a> Brisbane 2023; see <a href="https://www.youtube.com/watch?v=NE6e7Xx7OLQ" target=_blank rel=noopener>video</a> and <a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>post</a>)</li><li><a href=https://docs.google.com/presentation/d/1vi0YKxmevanE8zA6u2ZuA835boSXKMa-Su8LZmLA7EA/edit target=_blank rel=noopener>Data ethics – beyond curve fitting</a> (given as part of a local fast.ai course in June 2021; see <a href="https://www.youtube.com/watch?v=P1ebqJ4ZIEI" target=_blank rel=noopener>video</a> and <a href=https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/>post</a>)</li><li><a href="https://www.youtube.com/watch?v=tMFr_agPLJY" target=_blank rel=noopener>Moving Automattic to net zero carbon emissions</a> (PublishPress interview from November 2020)</li><li><a href="https://www.youtube.com/watch?v=79LfP8Kqgvw" target=_blank rel=noopener>Running remote data teams</a> (Data Futurology webinar from June 2020)</li><li><a href=http://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/>Bootstrapping the right way</a> (presented at <a href=https://yowconference.com/data/2019/ target=_blank rel="noopener noreferrer">YOW! Data 2019</a>; also available as <a href="https://www.youtube.com/watch?v=9SwvIqEQXP0" target=_blank rel="noopener noreferrer">a video</a>)</li><li><a href=https://yanirs.github.io/talks/remote-data-scientist/ target=_blank rel="noopener noreferrer">A day in the life of a remote data scientist</a> (presented at <a href=https://www.meetup.com/Data-Science-Sydney/ target=_blank rel="noopener noreferrer">Data Science Sydney</a> meetup 2019; also available as <a href="https://www.youtube.com/watch?v=5qbVEEtgWcY" target=_blank rel="noopener noreferrer">a video</a>)</li><li><a href=https://yanirs.github.io/talks/ask-why/ target=_blank rel="noopener noreferrer">Ask Why! Finding motives, causes, and purpose in data science</a> (presented at <a href=http://www.datasciencemelbourne.com/medascin2016/ target=_blank rel="noopener noreferrer">MeDaScIn 2016</a> and at <a href=http://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>Data Science Sydney</a> meetup 2016; also available as <a href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/>a blog post</a> and as <a href="http://www.youtube.com/watch?v=2wqu-drqlpo" target=_blank rel="noopener noreferrer">a video</a>)</li><li><a href=http://yanirs.github.io/talks/the-hardest-part-of-data-science/ target=_blank rel="noopener noreferrer">The hardest parts of data science</a> (presented at <a href=http://www.meetup.com/The-Sydney-Data-Science-Breakfast-Meetup-Group/ target=_blank rel="noopener noreferrer">Sydney Data Science Breakfast</a> meetup 2015; also <a href=https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/>available as a blog post</a>)</li><li><a href=http://yanirs.github.io/talks/the-wonderful-world-of-recommender-systems/ target=_blank rel="noopener noreferrer">The wonderful world of recommender systems</a> (presented at <a href=http://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>Data Science Sydney</a> meetup 2015; also <a href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/>available as a blog post</a>)</li><li><a href=http://yanirs.github.io/talks/gensim-overview/ target=_blank rel="noopener noreferrer">Gensim: Topic Modelling for Humans</a> (an overview of the <a href=http://radimrehurek.com/gensim/ target=_blank rel="noopener noreferrer">gensim</a> package, presented at the <a href=http://www.meetup.com/sydneypython/ target=_blank rel="noopener noreferrer">Sydney Python</a> meetup 2015)</li><li><a href=http://yanirs.github.io/talks/general-assembly-intro-to-data-science/ target=_blank rel=noopener>Demystifying data: An introduction to data science</a> (presented as a <a href=https://generalassemb.ly/education/demystifying-data-an-introduction-to-data-science target=_blank rel=noopener>General Assembly Workshop</a>)</li><li><a href=http://yanirs.github.io/talks/data-science-sydney-winning-kaggle/ target=_blank rel=noopener>How to (almost) win Kaggle competitions</a> (presented at <a href=http://www.meetup.com/Data-Science-Sydney/ target=_blank rel=noopener>Data Science Sydney</a> meetup 2014; also <a href=https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/>available as a blog post</a>)</li><li><a href=http://yanirs.github.io/talks/high-level-recommender-systems-intro/ target=_blank rel=noopener>High-level introduction to recommender systems</a> (a much-shorter version of <a href=https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/>The wonderful world of recommender systems</a>)</li><li><a href=http://prezi.com/aqk4kstbvg9v/how-to-avoid-most-sharding-issues/ target=_blank rel=noopener>How to avoid most sharding issues with MongoDB</a> (presented at MongoDB Sydney Meetup 2013)</li><li><a href=http://yanirs.github.io/talks/mongo2012.pdf target=_blank rel=noopener>Some issues we encountered with Mongo 2.2</a> (MongoDB Conference 2012 Lightning Talk)</li><li><a href=http://yanirs.github.io/talks/acl2012-poster.pdf target=_blank rel=noopener>Authorship attribution with author-aware topic models</a> (presented as a poster at <a href=http://acl2012.org/ target=_blank rel=noopener>ACL 2012</a>)</li><li><a href=http://yanirs.github.io/talks/ht2011-talk.pdf target=_blank rel=noopener>Personalised rating prediction for new users using latent factor models</a> (presented at <a href=http://www.ht2011.org/ target=_blank rel=noopener>Hypertext 2011</a>)</li></ul></div><footer class=post-footer><ul class=post-tags></ul></footer></article></main><div class=global-footer><div class=footer><span>Text and figures licensed under <a href=https://creativecommons.org/licenses/by-nc-nd/4.0/ target=_blank rel=noopener>CC BY-NC-ND 4.0</a> by <a href=https://yanirseroussi.com/about/>Yanir Seroussi</a>, except where noted otherwise&nbsp;&nbsp;|</span>
 <span>Powered by
 <a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a> &
       <a href=https://github.com/adityatelange/hugo-PaperMod/ rel=noopener target=_blank>PaperMod</a></span></div></div><script>const menuTrigger=document.querySelector("#menu-trigger"),menuElem=document.querySelector(".menu");menuTrigger.addEventListener("click",function(){menuElem.classList.toggle("hidden")}),document.body.addEventListener("click",function(e){menuTrigger.contains(e.target)||menuElem.classList.add("hidden")})</script><script>let menu=document.getElementById("menu");menu&&(menu.scrollLeft=localStorage.getItem("menu-scroll-position"),menu.onscroll=function(){localStorage.setItem("menu-scroll-position",menu.scrollLeft)}),document.querySelectorAll('a[href^="#"]').forEach(e=>{e.addEventListener("click",function(e){e.preventDefault();var t=this.getAttribute("href").substr(1);window.matchMedia("(prefers-reduced-motion: reduce)").matches?document.querySelector(`[id='${decodeURIComponent(t)}']`).scrollIntoView():document.querySelector(`[id='${decodeURIComponent(t)}']`).scrollIntoView({behavior:"smooth"}),t==="top"?history.replaceState(null,null," "):history.pushState(null,null,`#${t}`)})})</script><script>document.getElementById("theme-toggle").addEventListener("click",()=>{document.body.className.includes("dark")?(document.body.classList.remove("dark"),localStorage.setItem("pref-theme","light")):(document.body.classList.add("dark"),localStorage.setItem("pref-theme","dark"))})</script></body></html>
\ No newline at end of file
diff --git a/til/2023/07/11/you-cant-save-time/index.html b/til/2023/07/11/you-cant-save-time/index.html
index b7c892420..27cb1a91e 100644
--- a/til/2023/07/11/you-cant-save-time/index.html
+++ b/til/2023/07/11/you-cant-save-time/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>You can't save time | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="books,personal,quotes"><meta name=description content="Time can be spent doing different activities, but it can&rsquo;t be stored and saved for later."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="You can't save time"><meta property="og:description" content="Time can be spent doing different activities, but it can&rsquo;t be stored and saved for later."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-07-11T00:00:00+00:00"><meta property="article:modified_time" content="2024-03-12T16:33:31+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="You can't save time"><meta name=twitter:description content="Time can be spent doing different activities, but it can&rsquo;t be stored and saved for later."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"You can't save time","item":"https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"You can't save time","name":"You can\u0027t save time","description":"Time can be spent doing different activities, but it can\u0026rsquo;t be stored and saved for later.","keywords":["books","personal","quotes"],"articleBody":"Quoting How to Speak Whale: A Voyage into the Future of Animal Communication by Tom Mustill (wonderful book!):\nI was gobsmacked. A whale lands on you and disappears. End of story. But thanks to lots of people who liked looking at whales and their intelligent machines, it was not the end at all. Machine learning and other branches of AI influence our daily lives in myriad ways. They have helped this book come into existence, with an algorithm transcribing the hundreds of hours of interviews I conducted for it. Other algorithms have checked my spelling and finished my sentences for me as I typed them. Google’s effective prediction of my email responses has made me realize how predictable a lot of my writing is (sorry, reader), and by extension, perhaps, most human language is. It has saved me huge amounts of time, and I have ended up spending this saved time procrastinating by looking at my phone, at news apps and shopping sites and social media, all of which have been beautifully designed and pumped full of AI whose purpose is to suck up my time and money and data.\nRelated:\nThe concept of time thrift/discipline, which I learned about from The WEIRDest People in the World by Joseph Henrich. Jevons paradox, which I learned about from The Ministry for the Future by Kim Stanley Robinson. ","wordCount":"226","inLanguage":"en","datePublished":"2023-07-11T00:00:00Z","dateModified":"2024-03-12T16:33:31+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">You can't save time</h1><div class=post-meta><span title='2023-07-11 00:00:00 +0000 UTC'>July 11, 2023</span></div></header><div class=post-content><p>Quoting <a href=https://www.tommustill.com/how-to-speak-whale target=_blank rel=noopener>How to Speak Whale: A Voyage into the Future of Animal Communication</a> by Tom Mustill (wonderful book!):</p><blockquote><p>I was gobsmacked. A whale lands on you and disappears. End of story. But thanks to lots of people who liked looking at whales and their intelligent machines, it was not the end at all. Machine learning and other branches of AI influence our daily lives in myriad ways. They have helped this book come into existence, with an algorithm transcribing the hundreds of hours of interviews I conducted for it. Other algorithms have checked my spelling and finished my sentences for me as I typed them. Google&rsquo;s effective prediction of my email responses has made me realize how predictable a lot of my writing is (sorry, reader), and by extension, perhaps, most human language is. It has saved me huge amounts of time, and I have ended up spending this saved time procrastinating by looking at my phone, at news apps and shopping sites and social media, all of which have been beautifully designed and pumped full of AI whose purpose is to suck up my time and money and data.</p></blockquote><p>Related:</p><ul><li>The concept of <a href=https://en.wikipedia.org/wiki/Time_discipline target=_blank rel=noopener>time thrift/discipline</a>, which I learned about from <a href=https://en.wikipedia.org/wiki/The_WEIRDest_People_in_the_World target=_blank rel=noopener>The WEIRDest People in the World</a> by Joseph Henrich.</li><li><a href=https://en.wikipedia.org/wiki/Jevons_paradox target=_blank rel=noopener>Jevons paradox</a>, which I learned about from <a href=https://en.wikipedia.org/wiki/The_Ministry_for_the_Future target=_blank rel=noopener>The Ministry for the Future</a> by Kim Stanley Robinson.</li></ul></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share You can't save time on x" href="https://x.com/intent/tweet/?text=You%20can%27t%20save%20time&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f11%2fyou-cant-save-time%2f&amp;hashtags=books%2cpersonal%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You can't save time on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f11%2fyou-cant-save-time%2f&amp;title=You%20can%27t%20save%20time&amp;summary=You%20can%27t%20save%20time&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f11%2fyou-cant-save-time%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You can't save time on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f11%2fyou-cant-save-time%2f&title=You%20can%27t%20save%20time"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You can't save time on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f11%2fyou-cant-save-time%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You can't save time on whatsapp" href="https://api.whatsapp.com/send?text=You%20can%27t%20save%20time%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f11%2fyou-cant-save-time%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You can't save time on telegram" href="https://telegram.me/share/url?text=You%20can%27t%20save%20time&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f11%2fyou-cant-save-time%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You can't save time on ycombinator" href="https://news.ycombinator.com/submitlink?t=You%20can%27t%20save%20time&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f11%2fyou-cant-save-time%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="books,personal,quotes"><meta name=description content="Time can be spent doing different activities, but it can&rsquo;t be stored and saved for later."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="You can't save time"><meta property="og:description" content="Time can be spent doing different activities, but it can&rsquo;t be stored and saved for later."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-07-11T00:00:00+00:00"><meta property="article:modified_time" content="2024-03-12T16:33:31+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="You can't save time"><meta name=twitter:description content="Time can be spent doing different activities, but it can&rsquo;t be stored and saved for later."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"You can't save time","item":"https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"You can't save time","name":"You can\u0027t save time","description":"Time can be spent doing different activities, but it can\u0026rsquo;t be stored and saved for later.","keywords":["books","personal","quotes"],"articleBody":"Quoting How to Speak Whale: A Voyage into the Future of Animal Communication by Tom Mustill (wonderful book!):\nI was gobsmacked. A whale lands on you and disappears. End of story. But thanks to lots of people who liked looking at whales and their intelligent machines, it was not the end at all. Machine learning and other branches of AI influence our daily lives in myriad ways. They have helped this book come into existence, with an algorithm transcribing the hundreds of hours of interviews I conducted for it. Other algorithms have checked my spelling and finished my sentences for me as I typed them. Google’s effective prediction of my email responses has made me realize how predictable a lot of my writing is (sorry, reader), and by extension, perhaps, most human language is. It has saved me huge amounts of time, and I have ended up spending this saved time procrastinating by looking at my phone, at news apps and shopping sites and social media, all of which have been beautifully designed and pumped full of AI whose purpose is to suck up my time and money and data.\nRelated:\nThe concept of time thrift/discipline, which I learned about from The WEIRDest People in the World by Joseph Henrich. Jevons paradox, which I learned about from The Ministry for the Future by Kim Stanley Robinson. ","wordCount":"226","inLanguage":"en","datePublished":"2023-07-11T00:00:00Z","dateModified":"2024-03-12T16:33:31+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">You can't save time</h1><div class=post-meta><span title='2023-07-11 00:00:00 +0000 UTC'>July 11, 2023</span></div></header><div class=post-content><p>Quoting <a href=https://www.tommustill.com/how-to-speak-whale target=_blank rel=noopener>How to Speak Whale: A Voyage into the Future of Animal Communication</a> by Tom Mustill (wonderful book!):</p><blockquote><p>I was gobsmacked. A whale lands on you and disappears. End of story. But thanks to lots of people who liked looking at whales and their intelligent machines, it was not the end at all. Machine learning and other branches of AI influence our daily lives in myriad ways. They have helped this book come into existence, with an algorithm transcribing the hundreds of hours of interviews I conducted for it. Other algorithms have checked my spelling and finished my sentences for me as I typed them. Google&rsquo;s effective prediction of my email responses has made me realize how predictable a lot of my writing is (sorry, reader), and by extension, perhaps, most human language is. It has saved me huge amounts of time, and I have ended up spending this saved time procrastinating by looking at my phone, at news apps and shopping sites and social media, all of which have been beautifully designed and pumped full of AI whose purpose is to suck up my time and money and data.</p></blockquote><p>Related:</p><ul><li>The concept of <a href=https://en.wikipedia.org/wiki/Time_discipline target=_blank rel=noopener>time thrift/discipline</a>, which I learned about from <a href=https://en.wikipedia.org/wiki/The_WEIRDest_People_in_the_World target=_blank rel=noopener>The WEIRDest People in the World</a> by Joseph Henrich.</li><li><a href=https://en.wikipedia.org/wiki/Jevons_paradox target=_blank rel=noopener>Jevons paradox</a>, which I learned about from <a href=https://en.wikipedia.org/wiki/The_Ministry_for_the_Future target=_blank rel=noopener>The Ministry for the Future</a> by Kim Stanley Robinson.</li></ul></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share You can't save time on x" href="https://x.com/intent/tweet/?text=You%20can%27t%20save%20time&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f11%2fyou-cant-save-time%2f&amp;hashtags=books%2cpersonal%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You can't save time on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f11%2fyou-cant-save-time%2f&amp;title=You%20can%27t%20save%20time&amp;summary=You%20can%27t%20save%20time&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f11%2fyou-cant-save-time%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You can't save time on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f11%2fyou-cant-save-time%2f&title=You%20can%27t%20save%20time"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You can't save time on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f11%2fyou-cant-save-time%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You can't save time on whatsapp" href="https://api.whatsapp.com/send?text=You%20can%27t%20save%20time%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f11%2fyou-cant-save-time%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You can't save time on telegram" href="https://telegram.me/share/url?text=You%20can%27t%20save%20time&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f11%2fyou-cant-save-time%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You can't save time on ycombinator" href="https://news.ycombinator.com/submitlink?t=You%20can%27t%20save%20time&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f11%2fyou-cant-save-time%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/index.html b/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/index.html
index d37c66ef7..e8d168513 100644
--- a/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/index.html
+++ b/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Making a TIL section with Hugo and PaperMod | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="blogging,Hugo,web development"><meta name=description content="How I added a Today I Learned section to my Hugo site with the PaperMod theme."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Making a TIL section with Hugo and PaperMod"><meta property="og:description" content="How I added a Today I Learned section to my Hugo site with the PaperMod theme."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-07-17T00:06:15+00:00"><meta property="article:modified_time" content="2023-07-17T17:18:06+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Making a TIL section with Hugo and PaperMod"><meta name=twitter:description content="How I added a Today I Learned section to my Hugo site with the PaperMod theme."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Making a TIL section with Hugo and PaperMod","item":"https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Making a TIL section with Hugo and PaperMod","name":"Making a TIL section with Hugo and PaperMod","description":"How I added a Today I Learned section to my Hugo site with the PaperMod theme.","keywords":["blogging","Hugo","web development"],"articleBody":"I started following Simon Willison as a way of getting informed about practical applications of large language models. It turns out that Simon is also an incredibly prolific blogger and open source contributor, with posts dating back to 2002. He recently gave an interview titled The Data Enthusiast’s Toolkit, where he discussed his Datasette project and a bit about his approach to writing online. In addition to his main blog, he maintains a separate TIL subdomain, where shares things he’s learned in quick rough posts.\nMost of my main website posts take a while to write, so I post infrequently. I enjoy the process of learning about topics through public writing, as I often start out thinking things are a certain way, and then make new discoveries when I search for references. However, I liked the idea of sharing more through quicker posts, so I decided to add a TIL section to my site.\nI made the switch from WordPress.com to Hugo almost two years ago and haven’t looked back. I love the extra control that it gives me, though it sometimes requires a bit of tinkering. However, adding a TIL section was a breeze thanks to a post by Jacob Kaplan-Moss. Since I use the PaperMod theme and my site is set up differently, I followed slightly different steps (links are to commits in the PR that added the TIL section):\nCopied the PaperMod’s archives.html to layout/til/list.html to serve as the base list layout under /til/. Tweaked the new list.html to only show posts of type til, and removed counts and drafts. Added content/til/_index.md with the section’s title and description, which are shown at the top of the list view and in meta tags. Added my first TIL post with a lovely quote from How to Speak Whale. Unlike Jacob, I don’t mind TIL posts showing up in the main RSS feed. It’s enough for me that they don’t show up on the main page, which didn’t require any configuration tweaks in my case.\nWhen I push changes to the website, I like checking the gh-pages branch to see what got deployed. Recently I added an additional gh-pages-unminified branch to produce a more human-friendly diff. You can see the human-friendly result of merging the PR that added the TIL section here.\nThat’s it for now. If I post TILs consistently, I will probably link to it from the top-level menu and tweak the /til/ page to also show posts by tag. For now, I like the idea of a quiet low-effort place to post publicly.\n","wordCount":"426","inLanguage":"en","datePublished":"2023-07-17T00:06:15Z","dateModified":"2023-07-17T17:18:06+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Making a TIL section with Hugo and PaperMod</h1><div class=post-meta><span title='2023-07-17 00:06:15 +0000 UTC'>July 17, 2023</span></div></header><div class=post-content><p>I started following <a href=https://simonwillison.net/ target=_blank rel=noopener>Simon Willison</a> as a way of getting informed about practical applications of large language models. It turns out that Simon is also an incredibly prolific blogger and open source contributor, with posts dating back to 2002. He recently gave an interview titled <a href="https://www.youtube.com/watch?v=zI43eaPc59Q" target=_blank rel=noopener>The Data Enthusiast&rsquo;s Toolkit</a>, where he discussed his <a href=https://datasette.io/ target=_blank rel=noopener>Datasette project</a> and a bit about his approach to writing online. In addition to his main blog, he maintains <a href=https://til.simonwillison.net/ target=_blank rel=noopener>a separate TIL subdomain</a>, where shares things he&rsquo;s learned in quick rough posts.</p><p>Most of <a href=https://yanirseroussi.com/>my main website posts</a> take a while to write, so I post infrequently. I enjoy the process of learning about topics through public writing, as I often start out thinking things are a certain way, and then make new discoveries when I search for references. However, I liked the idea of sharing more through quicker posts, so I decided to add a TIL section to my site.</p><p><a href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/>I made the switch from WordPress.com to Hugo almost two years ago</a> and haven&rsquo;t looked back. I love the extra control that it gives me, though it sometimes requires a bit of tinkering. However, adding a TIL section was a breeze thanks to <a href=https://jacobian.org/til/how-to-make-a-til-section/ target=_blank rel=noopener>a post by Jacob Kaplan-Moss</a>. Since I use the PaperMod theme and my site is set up differently, I followed slightly different steps (links are to <a href=https://github.com/yanirs/yanirseroussi.com/pull/7 target=_blank rel=noopener>commits in the PR that added the TIL section</a>):</p><ol><li><a href=https://github.com/yanirs/yanirseroussi.com/pull/7/commits/a2eb33e3d9b96e6c3eca1ad9475471e60765b019 target=_blank rel=noopener>Copied the PaperMod&rsquo;s <code>archives.html</code> to <code>layout/til/list.html</code></a> to serve as <a href=https://yanirseroussi.com/til/>the base list layout under <code>/til/</code></a>.</li><li><a href=https://github.com/yanirs/yanirseroussi.com/pull/7/commits/b0859b5b1f5a936d0e73ce98a6b1565cb36b4832 target=_blank rel=noopener>Tweaked the new <code>list.html</code> to only show posts of type <code>til</code></a>, and removed counts and drafts.</li><li><a href=https://github.com/yanirs/yanirseroussi.com/pull/7/commits/e6ce0aed4bb7bcbbdbe913db0e786447b6497076 target=_blank rel=noopener>Added <code>content/til/_index.md</code> with the section&rsquo;s title and description</a>, which are shown at the top of the list view and in meta tags.</li><li><a href=https://github.com/yanirs/yanirseroussi.com/pull/7/commits/3841a1009f3749a208b1a53200fb0484951f2ca8 target=_blank rel=noopener>Added my first TIL post</a> with <a href=https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/>a lovely quote from <em>How to Speak Whale</em></a>.</li></ol><p>Unlike Jacob, I don&rsquo;t mind TIL posts showing up in the main RSS feed. It&rsquo;s enough for me that they don&rsquo;t show up on the main page, which didn&rsquo;t require any configuration tweaks in my case.</p><p>When I push changes to the website, I like checking the <code>gh-pages</code> branch to see what got deployed. Recently <a href=https://github.com/yanirs/yanirseroussi.com/commit/485a4e9960d513ad53a06774ba01daae349b409c target=_blank rel=noopener>I added an additional <code>gh-pages-unminified</code> branch</a> to produce a more human-friendly diff. You can see the human-friendly result of merging the PR that added the TIL section <a href=https://github.com/yanirs/yanirseroussi.com/commit/ed3a7ab3df061c2ddce95ff3803674ee67941e06 target=_blank rel=noopener>here</a>.</p><p>That&rsquo;s it for now. If I post TILs consistently, I will probably link to it from the top-level menu and tweak the <code>/til/</code> page to also show posts by tag. For now, I like the idea of a quiet low-effort place to post publicly.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/blogging/>Blogging</a></li><li><a href=https://yanirseroussi.com/tags/hugo/>Hugo</a></li><li><a href=https://yanirseroussi.com/tags/web-development/>Web Development</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Making a TIL section with Hugo and PaperMod on x" href="https://x.com/intent/tweet/?text=Making%20a%20TIL%20section%20with%20Hugo%20and%20PaperMod&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f17%2fmaking-a-til-section-with-hugo-and-papermod%2f&amp;hashtags=blogging%2cHugo%2cwebdevelopment"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Making a TIL section with Hugo and PaperMod on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f17%2fmaking-a-til-section-with-hugo-and-papermod%2f&amp;title=Making%20a%20TIL%20section%20with%20Hugo%20and%20PaperMod&amp;summary=Making%20a%20TIL%20section%20with%20Hugo%20and%20PaperMod&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f17%2fmaking-a-til-section-with-hugo-and-papermod%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Making a TIL section with Hugo and PaperMod on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f17%2fmaking-a-til-section-with-hugo-and-papermod%2f&title=Making%20a%20TIL%20section%20with%20Hugo%20and%20PaperMod"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Making a TIL section with Hugo and PaperMod on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f17%2fmaking-a-til-section-with-hugo-and-papermod%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Making a TIL section with Hugo and PaperMod on whatsapp" href="https://api.whatsapp.com/send?text=Making%20a%20TIL%20section%20with%20Hugo%20and%20PaperMod%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f17%2fmaking-a-til-section-with-hugo-and-papermod%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Making a TIL section with Hugo and PaperMod on telegram" href="https://telegram.me/share/url?text=Making%20a%20TIL%20section%20with%20Hugo%20and%20PaperMod&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f17%2fmaking-a-til-section-with-hugo-and-papermod%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Making a TIL section with Hugo and PaperMod on ycombinator" href="https://news.ycombinator.com/submitlink?t=Making%20a%20TIL%20section%20with%20Hugo%20and%20PaperMod&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f17%2fmaking-a-til-section-with-hugo-and-papermod%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="blogging,Hugo,web development"><meta name=description content="How I added a Today I Learned section to my Hugo site with the PaperMod theme."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Making a TIL section with Hugo and PaperMod"><meta property="og:description" content="How I added a Today I Learned section to my Hugo site with the PaperMod theme."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-07-17T00:06:15+00:00"><meta property="article:modified_time" content="2023-07-17T17:18:06+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Making a TIL section with Hugo and PaperMod"><meta name=twitter:description content="How I added a Today I Learned section to my Hugo site with the PaperMod theme."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Making a TIL section with Hugo and PaperMod","item":"https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Making a TIL section with Hugo and PaperMod","name":"Making a TIL section with Hugo and PaperMod","description":"How I added a Today I Learned section to my Hugo site with the PaperMod theme.","keywords":["blogging","Hugo","web development"],"articleBody":"I started following Simon Willison as a way of getting informed about practical applications of large language models. It turns out that Simon is also an incredibly prolific blogger and open source contributor, with posts dating back to 2002. He recently gave an interview titled The Data Enthusiast’s Toolkit, where he discussed his Datasette project and a bit about his approach to writing online. In addition to his main blog, he maintains a separate TIL subdomain, where shares things he’s learned in quick rough posts.\nMost of my main website posts take a while to write, so I post infrequently. I enjoy the process of learning about topics through public writing, as I often start out thinking things are a certain way, and then make new discoveries when I search for references. However, I liked the idea of sharing more through quicker posts, so I decided to add a TIL section to my site.\nI made the switch from WordPress.com to Hugo almost two years ago and haven’t looked back. I love the extra control that it gives me, though it sometimes requires a bit of tinkering. However, adding a TIL section was a breeze thanks to a post by Jacob Kaplan-Moss. Since I use the PaperMod theme and my site is set up differently, I followed slightly different steps (links are to commits in the PR that added the TIL section):\nCopied the PaperMod’s archives.html to layout/til/list.html to serve as the base list layout under /til/. Tweaked the new list.html to only show posts of type til, and removed counts and drafts. Added content/til/_index.md with the section’s title and description, which are shown at the top of the list view and in meta tags. Added my first TIL post with a lovely quote from How to Speak Whale. Unlike Jacob, I don’t mind TIL posts showing up in the main RSS feed. It’s enough for me that they don’t show up on the main page, which didn’t require any configuration tweaks in my case.\nWhen I push changes to the website, I like checking the gh-pages branch to see what got deployed. Recently I added an additional gh-pages-unminified branch to produce a more human-friendly diff. You can see the human-friendly result of merging the PR that added the TIL section here.\nThat’s it for now. If I post TILs consistently, I will probably link to it from the top-level menu and tweak the /til/ page to also show posts by tag. For now, I like the idea of a quiet low-effort place to post publicly.\n","wordCount":"426","inLanguage":"en","datePublished":"2023-07-17T00:06:15Z","dateModified":"2023-07-17T17:18:06+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Making a TIL section with Hugo and PaperMod</h1><div class=post-meta><span title='2023-07-17 00:06:15 +0000 UTC'>July 17, 2023</span></div></header><div class=post-content><p>I started following <a href=https://simonwillison.net/ target=_blank rel=noopener>Simon Willison</a> as a way of getting informed about practical applications of large language models. It turns out that Simon is also an incredibly prolific blogger and open source contributor, with posts dating back to 2002. He recently gave an interview titled <a href="https://www.youtube.com/watch?v=zI43eaPc59Q" target=_blank rel=noopener>The Data Enthusiast&rsquo;s Toolkit</a>, where he discussed his <a href=https://datasette.io/ target=_blank rel=noopener>Datasette project</a> and a bit about his approach to writing online. In addition to his main blog, he maintains <a href=https://til.simonwillison.net/ target=_blank rel=noopener>a separate TIL subdomain</a>, where shares things he&rsquo;s learned in quick rough posts.</p><p>Most of <a href=https://yanirseroussi.com/>my main website posts</a> take a while to write, so I post infrequently. I enjoy the process of learning about topics through public writing, as I often start out thinking things are a certain way, and then make new discoveries when I search for references. However, I liked the idea of sharing more through quicker posts, so I decided to add a TIL section to my site.</p><p><a href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/>I made the switch from WordPress.com to Hugo almost two years ago</a> and haven&rsquo;t looked back. I love the extra control that it gives me, though it sometimes requires a bit of tinkering. However, adding a TIL section was a breeze thanks to <a href=https://jacobian.org/til/how-to-make-a-til-section/ target=_blank rel=noopener>a post by Jacob Kaplan-Moss</a>. Since I use the PaperMod theme and my site is set up differently, I followed slightly different steps (links are to <a href=https://github.com/yanirs/yanirseroussi.com/pull/7 target=_blank rel=noopener>commits in the PR that added the TIL section</a>):</p><ol><li><a href=https://github.com/yanirs/yanirseroussi.com/pull/7/commits/a2eb33e3d9b96e6c3eca1ad9475471e60765b019 target=_blank rel=noopener>Copied the PaperMod&rsquo;s <code>archives.html</code> to <code>layout/til/list.html</code></a> to serve as <a href=https://yanirseroussi.com/til/>the base list layout under <code>/til/</code></a>.</li><li><a href=https://github.com/yanirs/yanirseroussi.com/pull/7/commits/b0859b5b1f5a936d0e73ce98a6b1565cb36b4832 target=_blank rel=noopener>Tweaked the new <code>list.html</code> to only show posts of type <code>til</code></a>, and removed counts and drafts.</li><li><a href=https://github.com/yanirs/yanirseroussi.com/pull/7/commits/e6ce0aed4bb7bcbbdbe913db0e786447b6497076 target=_blank rel=noopener>Added <code>content/til/_index.md</code> with the section&rsquo;s title and description</a>, which are shown at the top of the list view and in meta tags.</li><li><a href=https://github.com/yanirs/yanirseroussi.com/pull/7/commits/3841a1009f3749a208b1a53200fb0484951f2ca8 target=_blank rel=noopener>Added my first TIL post</a> with <a href=https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/>a lovely quote from <em>How to Speak Whale</em></a>.</li></ol><p>Unlike Jacob, I don&rsquo;t mind TIL posts showing up in the main RSS feed. It&rsquo;s enough for me that they don&rsquo;t show up on the main page, which didn&rsquo;t require any configuration tweaks in my case.</p><p>When I push changes to the website, I like checking the <code>gh-pages</code> branch to see what got deployed. Recently <a href=https://github.com/yanirs/yanirseroussi.com/commit/485a4e9960d513ad53a06774ba01daae349b409c target=_blank rel=noopener>I added an additional <code>gh-pages-unminified</code> branch</a> to produce a more human-friendly diff. You can see the human-friendly result of merging the PR that added the TIL section <a href=https://github.com/yanirs/yanirseroussi.com/commit/ed3a7ab3df061c2ddce95ff3803674ee67941e06 target=_blank rel=noopener>here</a>.</p><p>That&rsquo;s it for now. If I post TILs consistently, I will probably link to it from the top-level menu and tweak the <code>/til/</code> page to also show posts by tag. For now, I like the idea of a quiet low-effort place to post publicly.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/blogging/>Blogging</a></li><li><a href=https://yanirseroussi.com/tags/hugo/>Hugo</a></li><li><a href=https://yanirseroussi.com/tags/web-development/>Web Development</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Making a TIL section with Hugo and PaperMod on x" href="https://x.com/intent/tweet/?text=Making%20a%20TIL%20section%20with%20Hugo%20and%20PaperMod&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f17%2fmaking-a-til-section-with-hugo-and-papermod%2f&amp;hashtags=blogging%2cHugo%2cwebdevelopment"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Making a TIL section with Hugo and PaperMod on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f17%2fmaking-a-til-section-with-hugo-and-papermod%2f&amp;title=Making%20a%20TIL%20section%20with%20Hugo%20and%20PaperMod&amp;summary=Making%20a%20TIL%20section%20with%20Hugo%20and%20PaperMod&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f17%2fmaking-a-til-section-with-hugo-and-papermod%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Making a TIL section with Hugo and PaperMod on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f17%2fmaking-a-til-section-with-hugo-and-papermod%2f&title=Making%20a%20TIL%20section%20with%20Hugo%20and%20PaperMod"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Making a TIL section with Hugo and PaperMod on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f17%2fmaking-a-til-section-with-hugo-and-papermod%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Making a TIL section with Hugo and PaperMod on whatsapp" href="https://api.whatsapp.com/send?text=Making%20a%20TIL%20section%20with%20Hugo%20and%20PaperMod%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f17%2fmaking-a-til-section-with-hugo-and-papermod%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Making a TIL section with Hugo and PaperMod on telegram" href="https://telegram.me/share/url?text=Making%20a%20TIL%20section%20with%20Hugo%20and%20PaperMod&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f17%2fmaking-a-til-section-with-hugo-and-papermod%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Making a TIL section with Hugo and PaperMod on ycombinator" href="https://news.ycombinator.com/submitlink?t=Making%20a%20TIL%20section%20with%20Hugo%20and%20PaperMod&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f17%2fmaking-a-til-section-with-hugo-and-papermod%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2023/07/23/using-yubikey-for-ssh-access/index.html b/til/2023/07/23/using-yubikey-for-ssh-access/index.html
index 34a7815cd..b04033181 100644
--- a/til/2023/07/23/using-yubikey-for-ssh-access/index.html
+++ b/til/2023/07/23/using-yubikey-for-ssh-access/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Using YubiKey for SSH access | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="DevOps,GitHub,security"><meta name=description content="Some pointers for setting up SSH access with YubiKey on Ubuntu 22.04."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Using YubiKey for SSH access"><meta property="og:description" content="Some pointers for setting up SSH access with YubiKey on Ubuntu 22.04."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-07-23T00:07:15+00:00"><meta property="article:modified_time" content="2023-07-25T09:30:43+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Using YubiKey for SSH access"><meta name=twitter:description content="Some pointers for setting up SSH access with YubiKey on Ubuntu 22.04."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Using YubiKey for SSH access","item":"https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Using YubiKey for SSH access","name":"Using YubiKey for SSH access","description":"Some pointers for setting up SSH access with YubiKey on Ubuntu 22.04.","keywords":["DevOps","GitHub","security"],"articleBody":"I’ve been getting increasingly paranoid about computer security over the years. There’s always plenty to improve as new threats and technologies keep evolving. For example, dependency chain compromises like the one disclosed by PyTorch in December 2022 show that even a seemingly-benign action like installing a well-known package can result in exposure of secrets, including private SSH keys.\nAs part of improving my security stance, I bought a couple of YubiKeys a couple of years ago and started using them wherever possible (either directly on supported sites or indirectly via Yubico Authenticator). At some point, I realised that YubiKeys may be used for SSH access as well, but I only got around to setting it up today. Turns out it’s pretty simple.\nOne problem with YubiKeys is that they often offer more than one way of doing things, and SSH access is no exception. Andrej Friesen covered the options for SSH in an accessible post, which led me to the official Yubico page on securing SSH with FIDO2. I went with the discoverable key option along with ed25519 as the algorithm (see this post for a short explanation of the different algorithms).\nFollowing the official guide was straightforward, but then I hit this error: sign_and_send_pubkey: signing failed for ED25519-SK \"...\" from agent: agent refused operation. The common cause of this error is having the wrong permissions on the private key file, but that wasn’t the case for me. After a bit of digging, I found this Reddit thread that points to a gnome-keyring issue as the root problem (I use Ubuntu 22.04 on my laptop). As suggested on the Reddit thread, adding IdentityAgent none to the relevant hosts in my SSH config sorted out the issue. It seems like a reasonable workaround for now – I’m happy that I now have my SSH access tied to my YubiKey.\nEdit: Connection multiplexing can be useful for reducing the need to re-authenticate when running parallel connections.\n","wordCount":"324","inLanguage":"en","datePublished":"2023-07-23T00:07:15Z","dateModified":"2023-07-25T09:30:43+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Using YubiKey for SSH access</h1><div class=post-meta><span title='2023-07-23 00:07:15 +0000 UTC'>July 23, 2023</span></div></header><div class=post-content><p>I&rsquo;ve been getting increasingly paranoid about computer security over the years. There&rsquo;s always plenty to improve as new threats and technologies keep evolving. For example, dependency chain compromises like <a href=https://www.bleepingcomputer.com/news/security/pytorch-discloses-malicious-dependency-chain-compromise-over-holidays/ target=_blank rel=noopener>the one disclosed by PyTorch in December 2022</a> show that even a seemingly-benign action like installing a well-known package can result in exposure of secrets, including private SSH keys.</p><p>As part of improving my security stance, I bought a couple of YubiKeys a couple of years ago and started using them wherever possible (either directly on supported sites or indirectly via <a href=https://www.yubico.com/products/yubico-authenticator/ target=_blank rel=noopener>Yubico Authenticator</a>). At some point, I realised that YubiKeys may be used for SSH access as well, but I only got around to setting it up today. Turns out it&rsquo;s pretty simple.</p><p>One problem with YubiKeys is that they often offer more than one way of doing things, and SSH access is no exception. <a href=https://www.ajfriesen.com/yubikey-ssh-key/ target=_blank rel=noopener>Andrej Friesen covered the options for SSH in an accessible post</a>, which led me to <a href=https://developers.yubico.com/SSH/Securing_SSH_with_FIDO2.html target=_blank rel=noopener>the official Yubico page on securing SSH with FIDO2</a>. I went with the discoverable key option along with <code>ed25519</code> as the algorithm (see <a href=https://www.cryptsus.com/blog/how-to-secure-your-ssh-server-with-public-key-elliptic-curve-ed25519-crypto.html target=_blank rel=noopener>this post</a> for a short explanation of the different algorithms).</p><p>Following the official guide was straightforward, but then I hit this error: <code>sign_and_send_pubkey: signing failed for ED25519-SK "..." from agent: agent refused operation</code>. The common cause of this error is having the wrong permissions on the private key file, but that wasn&rsquo;t the case for me. After a bit of digging, I found <a href=https://www.reddit.com/r/yubikey/comments/wip57i/ubuntu_ssh_sign_and_send_pubkey_signing_failed/ target=_blank rel=noopener>this Reddit thread</a> that points to a <a href=https://gitlab.gnome.org/GNOME/gnome-keyring/-/issues/101 target=_blank rel=noopener>gnome-keyring issue as the root problem</a> (I use Ubuntu 22.04 on my laptop). As suggested on the Reddit thread, adding <code>IdentityAgent none</code> to the relevant hosts in my SSH config sorted out the issue. It seems like a reasonable workaround for now – I&rsquo;m happy that I now have my SSH access tied to my YubiKey.</p><p><em>Edit</em>: <a href=https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Multiplexing target=_blank rel=noopener>Connection multiplexing</a> can be useful for reducing the need to re-authenticate when running parallel connections.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/devops/>DevOps</a></li><li><a href=https://yanirseroussi.com/tags/github/>GitHub</a></li><li><a href=https://yanirseroussi.com/tags/security/>Security</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Using YubiKey for SSH access on x" href="https://x.com/intent/tweet/?text=Using%20YubiKey%20for%20SSH%20access&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f23%2fusing-yubikey-for-ssh-access%2f&amp;hashtags=DevOps%2cGitHub%2csecurity"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Using YubiKey for SSH access on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f23%2fusing-yubikey-for-ssh-access%2f&amp;title=Using%20YubiKey%20for%20SSH%20access&amp;summary=Using%20YubiKey%20for%20SSH%20access&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f23%2fusing-yubikey-for-ssh-access%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Using YubiKey for SSH access on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f23%2fusing-yubikey-for-ssh-access%2f&title=Using%20YubiKey%20for%20SSH%20access"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Using YubiKey for SSH access on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f23%2fusing-yubikey-for-ssh-access%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Using YubiKey for SSH access on whatsapp" href="https://api.whatsapp.com/send?text=Using%20YubiKey%20for%20SSH%20access%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f23%2fusing-yubikey-for-ssh-access%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Using YubiKey for SSH access on telegram" href="https://telegram.me/share/url?text=Using%20YubiKey%20for%20SSH%20access&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f23%2fusing-yubikey-for-ssh-access%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Using YubiKey for SSH access on ycombinator" href="https://news.ycombinator.com/submitlink?t=Using%20YubiKey%20for%20SSH%20access&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f23%2fusing-yubikey-for-ssh-access%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="DevOps,GitHub,security"><meta name=description content="Some pointers for setting up SSH access with YubiKey on Ubuntu 22.04."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Using YubiKey for SSH access"><meta property="og:description" content="Some pointers for setting up SSH access with YubiKey on Ubuntu 22.04."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-07-23T00:07:15+00:00"><meta property="article:modified_time" content="2023-07-25T09:30:43+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Using YubiKey for SSH access"><meta name=twitter:description content="Some pointers for setting up SSH access with YubiKey on Ubuntu 22.04."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Using YubiKey for SSH access","item":"https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Using YubiKey for SSH access","name":"Using YubiKey for SSH access","description":"Some pointers for setting up SSH access with YubiKey on Ubuntu 22.04.","keywords":["DevOps","GitHub","security"],"articleBody":"I’ve been getting increasingly paranoid about computer security over the years. There’s always plenty to improve as new threats and technologies keep evolving. For example, dependency chain compromises like the one disclosed by PyTorch in December 2022 show that even a seemingly-benign action like installing a well-known package can result in exposure of secrets, including private SSH keys.\nAs part of improving my security stance, I bought a couple of YubiKeys a couple of years ago and started using them wherever possible (either directly on supported sites or indirectly via Yubico Authenticator). At some point, I realised that YubiKeys may be used for SSH access as well, but I only got around to setting it up today. Turns out it’s pretty simple.\nOne problem with YubiKeys is that they often offer more than one way of doing things, and SSH access is no exception. Andrej Friesen covered the options for SSH in an accessible post, which led me to the official Yubico page on securing SSH with FIDO2. I went with the discoverable key option along with ed25519 as the algorithm (see this post for a short explanation of the different algorithms).\nFollowing the official guide was straightforward, but then I hit this error: sign_and_send_pubkey: signing failed for ED25519-SK \"...\" from agent: agent refused operation. The common cause of this error is having the wrong permissions on the private key file, but that wasn’t the case for me. After a bit of digging, I found this Reddit thread that points to a gnome-keyring issue as the root problem (I use Ubuntu 22.04 on my laptop). As suggested on the Reddit thread, adding IdentityAgent none to the relevant hosts in my SSH config sorted out the issue. It seems like a reasonable workaround for now – I’m happy that I now have my SSH access tied to my YubiKey.\nEdit: Connection multiplexing can be useful for reducing the need to re-authenticate when running parallel connections.\n","wordCount":"324","inLanguage":"en","datePublished":"2023-07-23T00:07:15Z","dateModified":"2023-07-25T09:30:43+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Using YubiKey for SSH access</h1><div class=post-meta><span title='2023-07-23 00:07:15 +0000 UTC'>July 23, 2023</span></div></header><div class=post-content><p>I&rsquo;ve been getting increasingly paranoid about computer security over the years. There&rsquo;s always plenty to improve as new threats and technologies keep evolving. For example, dependency chain compromises like <a href=https://www.bleepingcomputer.com/news/security/pytorch-discloses-malicious-dependency-chain-compromise-over-holidays/ target=_blank rel=noopener>the one disclosed by PyTorch in December 2022</a> show that even a seemingly-benign action like installing a well-known package can result in exposure of secrets, including private SSH keys.</p><p>As part of improving my security stance, I bought a couple of YubiKeys a couple of years ago and started using them wherever possible (either directly on supported sites or indirectly via <a href=https://www.yubico.com/products/yubico-authenticator/ target=_blank rel=noopener>Yubico Authenticator</a>). At some point, I realised that YubiKeys may be used for SSH access as well, but I only got around to setting it up today. Turns out it&rsquo;s pretty simple.</p><p>One problem with YubiKeys is that they often offer more than one way of doing things, and SSH access is no exception. <a href=https://www.ajfriesen.com/yubikey-ssh-key/ target=_blank rel=noopener>Andrej Friesen covered the options for SSH in an accessible post</a>, which led me to <a href=https://developers.yubico.com/SSH/Securing_SSH_with_FIDO2.html target=_blank rel=noopener>the official Yubico page on securing SSH with FIDO2</a>. I went with the discoverable key option along with <code>ed25519</code> as the algorithm (see <a href=https://www.cryptsus.com/blog/how-to-secure-your-ssh-server-with-public-key-elliptic-curve-ed25519-crypto.html target=_blank rel=noopener>this post</a> for a short explanation of the different algorithms).</p><p>Following the official guide was straightforward, but then I hit this error: <code>sign_and_send_pubkey: signing failed for ED25519-SK "..." from agent: agent refused operation</code>. The common cause of this error is having the wrong permissions on the private key file, but that wasn&rsquo;t the case for me. After a bit of digging, I found <a href=https://www.reddit.com/r/yubikey/comments/wip57i/ubuntu_ssh_sign_and_send_pubkey_signing_failed/ target=_blank rel=noopener>this Reddit thread</a> that points to a <a href=https://gitlab.gnome.org/GNOME/gnome-keyring/-/issues/101 target=_blank rel=noopener>gnome-keyring issue as the root problem</a> (I use Ubuntu 22.04 on my laptop). As suggested on the Reddit thread, adding <code>IdentityAgent none</code> to the relevant hosts in my SSH config sorted out the issue. It seems like a reasonable workaround for now – I&rsquo;m happy that I now have my SSH access tied to my YubiKey.</p><p><em>Edit</em>: <a href=https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Multiplexing target=_blank rel=noopener>Connection multiplexing</a> can be useful for reducing the need to re-authenticate when running parallel connections.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/devops/>DevOps</a></li><li><a href=https://yanirseroussi.com/tags/github/>GitHub</a></li><li><a href=https://yanirseroussi.com/tags/security/>Security</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Using YubiKey for SSH access on x" href="https://x.com/intent/tweet/?text=Using%20YubiKey%20for%20SSH%20access&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f23%2fusing-yubikey-for-ssh-access%2f&amp;hashtags=DevOps%2cGitHub%2csecurity"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Using YubiKey for SSH access on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f23%2fusing-yubikey-for-ssh-access%2f&amp;title=Using%20YubiKey%20for%20SSH%20access&amp;summary=Using%20YubiKey%20for%20SSH%20access&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f23%2fusing-yubikey-for-ssh-access%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Using YubiKey for SSH access on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f23%2fusing-yubikey-for-ssh-access%2f&title=Using%20YubiKey%20for%20SSH%20access"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Using YubiKey for SSH access on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f23%2fusing-yubikey-for-ssh-access%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Using YubiKey for SSH access on whatsapp" href="https://api.whatsapp.com/send?text=Using%20YubiKey%20for%20SSH%20access%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f23%2fusing-yubikey-for-ssh-access%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Using YubiKey for SSH access on telegram" href="https://telegram.me/share/url?text=Using%20YubiKey%20for%20SSH%20access&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f23%2fusing-yubikey-for-ssh-access%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Using YubiKey for SSH access on ycombinator" href="https://news.ycombinator.com/submitlink?t=Using%20YubiKey%20for%20SSH%20access&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f07%2f23%2fusing-yubikey-for-ssh-access%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/index.html b/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/index.html
index df4588b7a..24dced16f 100644
--- a/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/index.html
+++ b/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>The rule of thirds can probably be ignored | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data visualisation"><meta name=description content="Turns out that the rule of thirds for composing visuals may not be that important."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The rule of thirds can probably be ignored"><meta property="og:description" content="Turns out that the rule of thirds for composing visuals may not be that important."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-08-11T03:15:00+00:00"><meta property="article:modified_time" content="2023-08-11T14:35:20+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="The rule of thirds can probably be ignored"><meta name=twitter:description content="Turns out that the rule of thirds for composing visuals may not be that important."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"The rule of thirds can probably be ignored","item":"https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The rule of thirds can probably be ignored","name":"The rule of thirds can probably be ignored","description":"Turns out that the rule of thirds for composing visuals may not be that important.","keywords":["data visualisation"],"articleBody":"I recently read Cole Nussbaumer Knaflic’s Storytelling with You. Building on the success of Cole’s previous books on data visualisation, Storytelling with You contains detailed strategies on how to plan, create, and deliver compelling presentations. As I have a talk coming up at DataEngBytes Brisbane, I figured it was worth reading to sharpen my presentation skills.\nWhile the book contains many tips that I am already familiar with (e.g., overloading every slide with dense bullet points is not a good idea), one thing that was new to me was the rule of thirds for composing images. As described on Wikipedia, the rule of thirds “proposes that an image should be imagined as divided into nine equal parts by two equally spaced horizontal lines and two equally spaced vertical lines, and that important compositional elements should be placed along these lines or their intersections.”\nIt’s nice to have rules to follow, but it seems like the rule of thirds was pretty much made up over two hundred years ago (see the Wikipedia article for historical details). Indeed, a recent study found that “for photographs that were rated as highly aesthetic and for a large set of paintings, calculated ROT [rule-of-thirds] values were about as low as in photographs that did not follow the rule of thirds.” And that “the rule of thirds seems to play only a minor, if any, role in large sets of high-quality photographs and paintings.” Similarly, an article from Adobe also states that the “rule” doesn’t have to be followed for a photo to be successful.\nThat said, Cole notes that when using an image that fills the slide, following the rule of thirds leaves a bit more space for overlaying text. For example, I accidentally benefited from using a stock photo that followed the rule back when I built a recommender system for music from Bandcamp (see the cover photo there for a partial snapshot). In general, it’s worth being mindful about the composition of slides and other visual elements, which is where knowledge of rules of thumb can be useful. But when it comes to things that can be tested, like high-traffic websites, rigorously experimenting with positioning and composition may be the best approach.\n","wordCount":"369","inLanguage":"en","datePublished":"2023-08-11T03:15:00Z","dateModified":"2023-08-11T14:35:20+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">The rule of thirds can probably be ignored</h1><div class=post-meta><span title='2023-08-11 03:15:00 +0000 UTC'>August 11, 2023</span></div></header><div class=post-content><p>I recently read <a href=https://www.storytellingwithyou.com/ target=_blank rel=noopener>Cole Nussbaumer Knaflic&rsquo;s <em>Storytelling with You</em></a>. Building on the success of Cole&rsquo;s previous books on data visualisation, <em>Storytelling with You</em> contains detailed strategies on how to plan, create, and deliver compelling presentations. As I have a talk coming up at <a href=https://dataengconf.com.au/ target=_blank rel=noopener>DataEngBytes Brisbane</a>, I figured it was worth reading to sharpen my presentation skills.</p><p>While the book contains many tips that I am already familiar with (e.g., overloading every slide with dense bullet points is not a good idea), one thing that was new to me was the <em>rule of thirds</em> for composing images. As described on Wikipedia, <a href=https://en.wikipedia.org/wiki/Rule_of_thirds target=_blank rel=noopener>the rule of thirds</a> <em>&ldquo;proposes that an image should be imagined as divided into nine equal parts by two equally spaced horizontal lines and two equally spaced vertical lines, and that important compositional elements should be placed along these lines or their intersections.&rdquo;</em></p><p>It&rsquo;s nice to have rules to follow, but it seems like the rule of thirds was pretty much made up over two hundred years ago (see the Wikipedia article for historical details). Indeed, <a href="https://brill.com/view/journals/artp/2/1-2/article-p163_11.xml?language=en&amp;ebody=full%20html-copy1" target=_blank rel=noopener>a recent study</a> found that <em>&ldquo;for photographs that were rated as highly aesthetic and for a large set of paintings, calculated ROT [rule-of-thirds] values were about as low as in photographs that did not follow the rule of thirds.&rdquo;</em> And that <em>&ldquo;the rule of thirds seems to play only a minor, if any, role in large sets of high-quality photographs and paintings.&rdquo;</em> Similarly, <a href=https://www.adobe.com/au/creativecloud/photography/discover/rule-of-thirds.html target=_blank rel=noopener>an article from Adobe</a> also states that the &ldquo;rule&rdquo; doesn&rsquo;t have to be followed for a photo to be successful.</p><p>That said, Cole notes that when using an image that fills the slide, following the rule of thirds leaves a bit more space for overlaying text. For example, I accidentally benefited from using a stock photo that followed the rule back when I built <a href=https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/>a recommender system for music from Bandcamp</a> (see the cover photo there for a partial snapshot). In general, it&rsquo;s worth being mindful about the composition of slides and other visual elements, which is where knowledge of rules of thumb can be useful. But when it comes to things that can be tested, like high-traffic websites, rigorously experimenting with positioning and composition may be the best approach.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-visualisation/>Data Visualisation</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The rule of thirds can probably be ignored on x" href="https://x.com/intent/tweet/?text=The%20rule%20of%20thirds%20can%20probably%20be%20ignored&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f11%2fthe-rule-of-thirds-can-probably-be-ignored%2f&amp;hashtags=datavisualisation"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rule of thirds can probably be ignored on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f11%2fthe-rule-of-thirds-can-probably-be-ignored%2f&amp;title=The%20rule%20of%20thirds%20can%20probably%20be%20ignored&amp;summary=The%20rule%20of%20thirds%20can%20probably%20be%20ignored&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f11%2fthe-rule-of-thirds-can-probably-be-ignored%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rule of thirds can probably be ignored on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f11%2fthe-rule-of-thirds-can-probably-be-ignored%2f&title=The%20rule%20of%20thirds%20can%20probably%20be%20ignored"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rule of thirds can probably be ignored on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f11%2fthe-rule-of-thirds-can-probably-be-ignored%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rule of thirds can probably be ignored on whatsapp" href="https://api.whatsapp.com/send?text=The%20rule%20of%20thirds%20can%20probably%20be%20ignored%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f11%2fthe-rule-of-thirds-can-probably-be-ignored%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rule of thirds can probably be ignored on telegram" href="https://telegram.me/share/url?text=The%20rule%20of%20thirds%20can%20probably%20be%20ignored&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f11%2fthe-rule-of-thirds-can-probably-be-ignored%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rule of thirds can probably be ignored on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20rule%20of%20thirds%20can%20probably%20be%20ignored&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f11%2fthe-rule-of-thirds-can-probably-be-ignored%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="data visualisation"><meta name=description content="Turns out that the rule of thirds for composing visuals may not be that important."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The rule of thirds can probably be ignored"><meta property="og:description" content="Turns out that the rule of thirds for composing visuals may not be that important."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-08-11T03:15:00+00:00"><meta property="article:modified_time" content="2023-08-11T14:35:20+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="The rule of thirds can probably be ignored"><meta name=twitter:description content="Turns out that the rule of thirds for composing visuals may not be that important."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"The rule of thirds can probably be ignored","item":"https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The rule of thirds can probably be ignored","name":"The rule of thirds can probably be ignored","description":"Turns out that the rule of thirds for composing visuals may not be that important.","keywords":["data visualisation"],"articleBody":"I recently read Cole Nussbaumer Knaflic’s Storytelling with You. Building on the success of Cole’s previous books on data visualisation, Storytelling with You contains detailed strategies on how to plan, create, and deliver compelling presentations. As I have a talk coming up at DataEngBytes Brisbane, I figured it was worth reading to sharpen my presentation skills.\nWhile the book contains many tips that I am already familiar with (e.g., overloading every slide with dense bullet points is not a good idea), one thing that was new to me was the rule of thirds for composing images. As described on Wikipedia, the rule of thirds “proposes that an image should be imagined as divided into nine equal parts by two equally spaced horizontal lines and two equally spaced vertical lines, and that important compositional elements should be placed along these lines or their intersections.”\nIt’s nice to have rules to follow, but it seems like the rule of thirds was pretty much made up over two hundred years ago (see the Wikipedia article for historical details). Indeed, a recent study found that “for photographs that were rated as highly aesthetic and for a large set of paintings, calculated ROT [rule-of-thirds] values were about as low as in photographs that did not follow the rule of thirds.” And that “the rule of thirds seems to play only a minor, if any, role in large sets of high-quality photographs and paintings.” Similarly, an article from Adobe also states that the “rule” doesn’t have to be followed for a photo to be successful.\nThat said, Cole notes that when using an image that fills the slide, following the rule of thirds leaves a bit more space for overlaying text. For example, I accidentally benefited from using a stock photo that followed the rule back when I built a recommender system for music from Bandcamp (see the cover photo there for a partial snapshot). In general, it’s worth being mindful about the composition of slides and other visual elements, which is where knowledge of rules of thumb can be useful. But when it comes to things that can be tested, like high-traffic websites, rigorously experimenting with positioning and composition may be the best approach.\n","wordCount":"369","inLanguage":"en","datePublished":"2023-08-11T03:15:00Z","dateModified":"2023-08-11T14:35:20+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">The rule of thirds can probably be ignored</h1><div class=post-meta><span title='2023-08-11 03:15:00 +0000 UTC'>August 11, 2023</span></div></header><div class=post-content><p>I recently read <a href=https://www.storytellingwithyou.com/ target=_blank rel=noopener>Cole Nussbaumer Knaflic&rsquo;s <em>Storytelling with You</em></a>. Building on the success of Cole&rsquo;s previous books on data visualisation, <em>Storytelling with You</em> contains detailed strategies on how to plan, create, and deliver compelling presentations. As I have a talk coming up at <a href=https://dataengconf.com.au/ target=_blank rel=noopener>DataEngBytes Brisbane</a>, I figured it was worth reading to sharpen my presentation skills.</p><p>While the book contains many tips that I am already familiar with (e.g., overloading every slide with dense bullet points is not a good idea), one thing that was new to me was the <em>rule of thirds</em> for composing images. As described on Wikipedia, <a href=https://en.wikipedia.org/wiki/Rule_of_thirds target=_blank rel=noopener>the rule of thirds</a> <em>&ldquo;proposes that an image should be imagined as divided into nine equal parts by two equally spaced horizontal lines and two equally spaced vertical lines, and that important compositional elements should be placed along these lines or their intersections.&rdquo;</em></p><p>It&rsquo;s nice to have rules to follow, but it seems like the rule of thirds was pretty much made up over two hundred years ago (see the Wikipedia article for historical details). Indeed, <a href="https://brill.com/view/journals/artp/2/1-2/article-p163_11.xml?language=en&amp;ebody=full%20html-copy1" target=_blank rel=noopener>a recent study</a> found that <em>&ldquo;for photographs that were rated as highly aesthetic and for a large set of paintings, calculated ROT [rule-of-thirds] values were about as low as in photographs that did not follow the rule of thirds.&rdquo;</em> And that <em>&ldquo;the rule of thirds seems to play only a minor, if any, role in large sets of high-quality photographs and paintings.&rdquo;</em> Similarly, <a href=https://www.adobe.com/au/creativecloud/photography/discover/rule-of-thirds.html target=_blank rel=noopener>an article from Adobe</a> also states that the &ldquo;rule&rdquo; doesn&rsquo;t have to be followed for a photo to be successful.</p><p>That said, Cole notes that when using an image that fills the slide, following the rule of thirds leaves a bit more space for overlaying text. For example, I accidentally benefited from using a stock photo that followed the rule back when I built <a href=https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/>a recommender system for music from Bandcamp</a> (see the cover photo there for a partial snapshot). In general, it&rsquo;s worth being mindful about the composition of slides and other visual elements, which is where knowledge of rules of thumb can be useful. But when it comes to things that can be tested, like high-traffic websites, rigorously experimenting with positioning and composition may be the best approach.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-visualisation/>Data Visualisation</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The rule of thirds can probably be ignored on x" href="https://x.com/intent/tweet/?text=The%20rule%20of%20thirds%20can%20probably%20be%20ignored&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f11%2fthe-rule-of-thirds-can-probably-be-ignored%2f&amp;hashtags=datavisualisation"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rule of thirds can probably be ignored on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f11%2fthe-rule-of-thirds-can-probably-be-ignored%2f&amp;title=The%20rule%20of%20thirds%20can%20probably%20be%20ignored&amp;summary=The%20rule%20of%20thirds%20can%20probably%20be%20ignored&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f11%2fthe-rule-of-thirds-can-probably-be-ignored%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rule of thirds can probably be ignored on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f11%2fthe-rule-of-thirds-can-probably-be-ignored%2f&title=The%20rule%20of%20thirds%20can%20probably%20be%20ignored"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rule of thirds can probably be ignored on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f11%2fthe-rule-of-thirds-can-probably-be-ignored%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rule of thirds can probably be ignored on whatsapp" href="https://api.whatsapp.com/send?text=The%20rule%20of%20thirds%20can%20probably%20be%20ignored%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f11%2fthe-rule-of-thirds-can-probably-be-ignored%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rule of thirds can probably be ignored on telegram" href="https://telegram.me/share/url?text=The%20rule%20of%20thirds%20can%20probably%20be%20ignored&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f11%2fthe-rule-of-thirds-can-probably-be-ignored%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rule of thirds can probably be ignored on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20rule%20of%20thirds%20can%20probably%20be%20ignored&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f11%2fthe-rule-of-thirds-can-probably-be-ignored%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2023/08/14/email-notifications-on-public-github-commits/index.html b/til/2023/08/14/email-notifications-on-public-github-commits/index.html
index f2373293d..863e55fb8 100644
--- a/til/2023/08/14/email-notifications-on-public-github-commits/index.html
+++ b/til/2023/08/14/email-notifications-on-public-github-commits/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Email notifications on public GitHub commits | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="productivity,Reef Life Survey"><meta name=description content="GitHub publishes an Atom feed, which means you can use any RSS reader to follow commits."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Email notifications on public GitHub commits"><meta property="og:description" content="GitHub publishes an Atom feed, which means you can use any RSS reader to follow commits."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-08-14T05:15:00+00:00"><meta property="article:modified_time" content="2023-08-14T15:44:21+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Email notifications on public GitHub commits"><meta name=twitter:description content="GitHub publishes an Atom feed, which means you can use any RSS reader to follow commits."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Email notifications on public GitHub commits","item":"https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Email notifications on public GitHub commits","name":"Email notifications on public GitHub commits","description":"GitHub publishes an Atom feed, which means you can use any RSS reader to follow commits.","keywords":["productivity","Reef Life Survey"],"articleBody":"I have a daily data processing workflow that runs as a GitHub Actions cron job on the rls-data repository. The workflow turns some public Reef Life Survey data into JSONs that are used by tools on the Reef Life Survey website. Given that the repository is public, running the workflow is free.\nWhen I first implemented the workflow, it was running once a week, so I used GitHub’s settings to get notified every time any Actions workflow ran. This included successful runs, which wasn’t too noisy. However, I received many other emails due to Actions that ran on other projects I was working on.\nIt was on my list to figure out a better way, but I only got around to it when I increased the flow’s frequency to daily. I still wanted to receive an email if the flow failed or made changes to the output JSONs, but there was no need for an email if it ran successfully without committing any changes.\nWhile GitHub supports getting emails on new pull requests by watching a repository, it doesn’t appear to support subscribing to commits. Fortunately, Stack Overflow has the answer for watching commits: Use the built-in RSS feed by appending /commits/.atom to the repository’s URL. In rls-data’s case, it is https://github.com/yanirs/rls-data/commits/master.atom, which I turned to emails using Blogtrottr.\nProblem solved! Now I only get emails on commits via Blogtrottr, along with emails on workflow failures directly from GitHub.\n","wordCount":"239","inLanguage":"en","datePublished":"2023-08-14T05:15:00Z","dateModified":"2023-08-14T15:44:21+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Email notifications on public GitHub commits</h1><div class=post-meta><span title='2023-08-14 05:15:00 +0000 UTC'>August 14, 2023</span></div></header><div class=post-content><p>I have a daily data processing workflow that runs as a GitHub Actions cron job on <a href=https://github.com/yanirs/rls-data/ target=_blank rel=noopener>the <code>rls-data</code> repository</a>. The workflow turns some public <a href=https://reeflifesurvey.com/ target=_blank rel=noopener>Reef Life Survey</a> data into JSONs that are used by tools on the Reef Life Survey website. Given that the repository is public, running the workflow is free.</p><p>When I first implemented the workflow, it was running once a week, so I used GitHub&rsquo;s settings to get notified every time <em>any</em> Actions workflow ran. This included successful runs, which wasn&rsquo;t too noisy. However, I received many other emails due to Actions that ran on other projects I was working on.</p><p>It was on my list to figure out a better way, but I only got around to it when I increased the flow&rsquo;s frequency to daily. I still wanted to receive an email if the flow failed or made changes to the output JSONs, but there was no need for an email if it ran successfully without committing any changes.</p><p>While GitHub supports getting emails on new pull requests by watching a repository, it doesn&rsquo;t appear to support subscribing to commits. Fortunately, <a href=https://stackoverflow.com/questions/9845655/how-do-i-get-notifications-for-commits-to-a-github-repository target=_blank rel=noopener>Stack Overflow has the answer for watching commits</a>: Use the built-in RSS feed by appending <code>/commits/&lt;branch>.atom</code> to the repository&rsquo;s URL. In <code>rls-data</code>&rsquo;s case, it is <a href=https://github.com/yanirs/rls-data/commits/master.atom target=_blank rel=noopener>https://github.com/yanirs/rls-data/commits/master.atom</a>, which I turned to emails using <a href=https://blogtrottr.com/ target=_blank rel=noopener>Blogtrottr</a>.</p><p>Problem solved! Now I only get emails on commits via Blogtrottr, along with emails on workflow failures directly from GitHub.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/productivity/>Productivity</a></li><li><a href=https://yanirseroussi.com/tags/reef-life-survey/>Reef Life Survey</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Email notifications on public GitHub commits on x" href="https://x.com/intent/tweet/?text=Email%20notifications%20on%20public%20GitHub%20commits&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f14%2femail-notifications-on-public-github-commits%2f&amp;hashtags=productivity%2cReefLifeSurvey"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Email notifications on public GitHub commits on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f14%2femail-notifications-on-public-github-commits%2f&amp;title=Email%20notifications%20on%20public%20GitHub%20commits&amp;summary=Email%20notifications%20on%20public%20GitHub%20commits&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f14%2femail-notifications-on-public-github-commits%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Email notifications on public GitHub commits on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f14%2femail-notifications-on-public-github-commits%2f&title=Email%20notifications%20on%20public%20GitHub%20commits"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Email notifications on public GitHub commits on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f14%2femail-notifications-on-public-github-commits%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Email notifications on public GitHub commits on whatsapp" href="https://api.whatsapp.com/send?text=Email%20notifications%20on%20public%20GitHub%20commits%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f14%2femail-notifications-on-public-github-commits%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Email notifications on public GitHub commits on telegram" href="https://telegram.me/share/url?text=Email%20notifications%20on%20public%20GitHub%20commits&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f14%2femail-notifications-on-public-github-commits%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Email notifications on public GitHub commits on ycombinator" href="https://news.ycombinator.com/submitlink?t=Email%20notifications%20on%20public%20GitHub%20commits&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f14%2femail-notifications-on-public-github-commits%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="productivity,Reef Life Survey"><meta name=description content="GitHub publishes an Atom feed, which means you can use any RSS reader to follow commits."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Email notifications on public GitHub commits"><meta property="og:description" content="GitHub publishes an Atom feed, which means you can use any RSS reader to follow commits."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-08-14T05:15:00+00:00"><meta property="article:modified_time" content="2023-08-14T15:44:21+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Email notifications on public GitHub commits"><meta name=twitter:description content="GitHub publishes an Atom feed, which means you can use any RSS reader to follow commits."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Email notifications on public GitHub commits","item":"https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Email notifications on public GitHub commits","name":"Email notifications on public GitHub commits","description":"GitHub publishes an Atom feed, which means you can use any RSS reader to follow commits.","keywords":["productivity","Reef Life Survey"],"articleBody":"I have a daily data processing workflow that runs as a GitHub Actions cron job on the rls-data repository. The workflow turns some public Reef Life Survey data into JSONs that are used by tools on the Reef Life Survey website. Given that the repository is public, running the workflow is free.\nWhen I first implemented the workflow, it was running once a week, so I used GitHub’s settings to get notified every time any Actions workflow ran. This included successful runs, which wasn’t too noisy. However, I received many other emails due to Actions that ran on other projects I was working on.\nIt was on my list to figure out a better way, but I only got around to it when I increased the flow’s frequency to daily. I still wanted to receive an email if the flow failed or made changes to the output JSONs, but there was no need for an email if it ran successfully without committing any changes.\nWhile GitHub supports getting emails on new pull requests by watching a repository, it doesn’t appear to support subscribing to commits. Fortunately, Stack Overflow has the answer for watching commits: Use the built-in RSS feed by appending /commits/.atom to the repository’s URL. In rls-data’s case, it is https://github.com/yanirs/rls-data/commits/master.atom, which I turned to emails using Blogtrottr.\nProblem solved! Now I only get emails on commits via Blogtrottr, along with emails on workflow failures directly from GitHub.\n","wordCount":"239","inLanguage":"en","datePublished":"2023-08-14T05:15:00Z","dateModified":"2023-08-14T15:44:21+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Email notifications on public GitHub commits</h1><div class=post-meta><span title='2023-08-14 05:15:00 +0000 UTC'>August 14, 2023</span></div></header><div class=post-content><p>I have a daily data processing workflow that runs as a GitHub Actions cron job on <a href=https://github.com/yanirs/rls-data/ target=_blank rel=noopener>the <code>rls-data</code> repository</a>. The workflow turns some public <a href=https://reeflifesurvey.com/ target=_blank rel=noopener>Reef Life Survey</a> data into JSONs that are used by tools on the Reef Life Survey website. Given that the repository is public, running the workflow is free.</p><p>When I first implemented the workflow, it was running once a week, so I used GitHub&rsquo;s settings to get notified every time <em>any</em> Actions workflow ran. This included successful runs, which wasn&rsquo;t too noisy. However, I received many other emails due to Actions that ran on other projects I was working on.</p><p>It was on my list to figure out a better way, but I only got around to it when I increased the flow&rsquo;s frequency to daily. I still wanted to receive an email if the flow failed or made changes to the output JSONs, but there was no need for an email if it ran successfully without committing any changes.</p><p>While GitHub supports getting emails on new pull requests by watching a repository, it doesn&rsquo;t appear to support subscribing to commits. Fortunately, <a href=https://stackoverflow.com/questions/9845655/how-do-i-get-notifications-for-commits-to-a-github-repository target=_blank rel=noopener>Stack Overflow has the answer for watching commits</a>: Use the built-in RSS feed by appending <code>/commits/&lt;branch>.atom</code> to the repository&rsquo;s URL. In <code>rls-data</code>&rsquo;s case, it is <a href=https://github.com/yanirs/rls-data/commits/master.atom target=_blank rel=noopener>https://github.com/yanirs/rls-data/commits/master.atom</a>, which I turned to emails using <a href=https://blogtrottr.com/ target=_blank rel=noopener>Blogtrottr</a>.</p><p>Problem solved! Now I only get emails on commits via Blogtrottr, along with emails on workflow failures directly from GitHub.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/productivity/>Productivity</a></li><li><a href=https://yanirseroussi.com/tags/reef-life-survey/>Reef Life Survey</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Email notifications on public GitHub commits on x" href="https://x.com/intent/tweet/?text=Email%20notifications%20on%20public%20GitHub%20commits&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f14%2femail-notifications-on-public-github-commits%2f&amp;hashtags=productivity%2cReefLifeSurvey"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Email notifications on public GitHub commits on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f14%2femail-notifications-on-public-github-commits%2f&amp;title=Email%20notifications%20on%20public%20GitHub%20commits&amp;summary=Email%20notifications%20on%20public%20GitHub%20commits&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f14%2femail-notifications-on-public-github-commits%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Email notifications on public GitHub commits on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f14%2femail-notifications-on-public-github-commits%2f&title=Email%20notifications%20on%20public%20GitHub%20commits"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Email notifications on public GitHub commits on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f14%2femail-notifications-on-public-github-commits%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Email notifications on public GitHub commits on whatsapp" href="https://api.whatsapp.com/send?text=Email%20notifications%20on%20public%20GitHub%20commits%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f14%2femail-notifications-on-public-github-commits%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Email notifications on public GitHub commits on telegram" href="https://telegram.me/share/url?text=Email%20notifications%20on%20public%20GitHub%20commits&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f14%2femail-notifications-on-public-github-commits%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Email notifications on public GitHub commits on ycombinator" href="https://news.ycombinator.com/submitlink?t=Email%20notifications%20on%20public%20GitHub%20commits&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f14%2femail-notifications-on-public-github-commits%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/index.html b/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/index.html
index e33a889e2..ba41ddb9c 100644
--- a/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/index.html
+++ b/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Revisiting Start Small, Stay Small in 2023 (Chapter 1) | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="books,business,career,marketing,personal,productivity,quotes"><meta name=description content="A summary of the first chapter of Rob Walling&rsquo;s Start Small, Stay Small, along with my thoughts & reflections."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Revisiting Start Small, Stay Small in 2023 (Chapter 1)"><meta property="og:description" content="A summary of the first chapter of Rob Walling&rsquo;s Start Small, Stay Small, along with my thoughts & reflections."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-08-16T05:45:00+00:00"><meta property="article:modified_time" content="2024-03-12T16:33:31+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Revisiting Start Small, Stay Small in 2023 (Chapter 1)"><meta name=twitter:description content="A summary of the first chapter of Rob Walling&rsquo;s Start Small, Stay Small, along with my thoughts & reflections."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Revisiting Start Small, Stay Small in 2023 (Chapter 1)","item":"https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Revisiting Start Small, Stay Small in 2023 (Chapter 1)","name":"Revisiting Start Small, Stay Small in 2023 (Chapter 1)","description":"A summary of the first chapter of Rob Walling\u0026rsquo;s Start Small, Stay Small, along with my thoughts \u0026amp; reflections.","keywords":["books","business","career","marketing","personal","productivity","quotes"],"articleBody":"I first read Start Small, Stay Small by Rob Walling in 2014, as I was working on a self-funded lifestyle business idea (aka micropreneurship). I ultimately abandoned the product I was working on in favour of salaried work, but now I’m considering new ideas, especially in the climate and nature-positive space.\nAs I spent the intervening years as an employee with VC-funded startups and with Automattic, my micropreneurship skills are much weaker than they could have been if I had stuck to building \u0026 selling products independently. Still, salaried work has its perks – I’m now in a more financially secure position than I was in 2014.\nAs part of getting back into micropreneurship, I figured it’d be worth rereading / skimming Start Small, Stay Small. This time, I’m using my TIL format to post some notes for my own reference.\nI originally thought this would be a single post on the entire book, but my summary and thoughts on Chapter 1 (The Chasm Between Developer and Entrepreneur) ended up on the lengthy side. My goal with the TIL format is to publish more frequently, so this is good enough for a standalone post.\nSummary:\nThe book is focused on micropreneurs (solo founders who want to remain solo) and bootstrappers (those who want to grow a business without taking external funding) – both follow a similar process. Seeking funding takes a lot of time and focus, and limits the markets you can pursue. It’s harder to justify aiming for niche markets and moderate success when you’re funded by venture capitalists who expect a substantial return on investment. Defining a self-funded startup entrepreneur: “technical visionary who creates software for a niche market” “merges existing technical knowledge with online marketing knowledge” “a cross between a developer, a webmaster, and a marketer” Wrong reasons to start: “having a product idea” – “without a market, a software application is just a project” “to get rich” “because it sounds like fun” The right reasons to start depend on your goals. Micropreneurs lean towards lifestyle choices (freedom \u0026 income/location-independence). Bootstrappers might lean more towards the challenge and excitement of ownership and control. It’s worth spending time clarifying goals. Communicating them publicly and creating an accountability system is helpful in following through. Suggested goal: Build a startup that generates a monthly profit of $500. It’s harder than it may sound. Goals are key to getting through “the dip”: The point where the work is so hard (e.g., high volume of support) it can become unbearable. Reasons why people switch from development to entrepreneurship: lack of learning (keeping up with new tech becomes less exciting) and wanting more ownership (when working for a salary, little equity is retained if you leave). Roadblocks to success and how to avoid them: “No market” – “building something no one wants”. Avoided by verifying there’s a market before building the product. “Fear”. Can’t be fully avoided, but this may help: “The up-front fear is a big indicator that you’re going to grow as a person if you proceed through it. And, frankly, the terror wears off pretty quickly.” “Lack of goals”, e.g., around profit growth and lifestyle. Avoided by defining your goals and writing them down. “Inconsistency”, doing pseudo-productive things such as reading business books – can’t consume information and produce at the same time. Avoided by setting limits on content consumption – asking yourself whether pseudo-productive activities are actually worth it. “Believing you have to do everything yourself”. Avoided by getting comfortable with outsourcing the right tasks to contractors and virtual assistants (e.g., probably outsource graphic design but don’t outsource the product architecture). Putting a dollar value on your work hours (i.e., dollarising your time) makes outsourcing decisions easier. It’s a step many entrepreneurs skip. This results in them performing menial tasks that can be outsourced, with an effective hourly rate that’s around the minimum wage or lower. Approaches to setting your current dollar value: (1) use freelancer rates; or (2) divide total compensation (including benefits) by work hours. Then set a desired rate. Don’t accept making something like $25 / hour. Make your target rate a reality as soon as possible, then increase it. Realisations that come from dollarising your time: “Outsourcing is a bargain”. “Keep work and play separate” – “work hard and play hard, but never do both at once”. Don’t do things like playing with your kids while working on your iPhone, as you’ll be doing both poorly. “Wasting time is bad” – unproductive non-leisure activities are wasted money. “Information consumption is only good when it produces something” (excluding consumption for leisure). Recommendation: “When reading blogs or books or listening to podcasts or audio books, take action notes.” If no actions arise, it may be that the content is low value. Realisations that come when transitioning from developer to entrepreneur: “Being a good technician is not enough”. It’s critical to do management work like thinking about return on investment and productivity, and visionary/creative work around the long-term direction of the business. This is a key component in escaping the $25 / hour pit. “Market comes first, marketing second, aesthetic third, and functionality a distant fourth”. “Things will never be as clear as you want them to be” – writing code is straightforward in comparison to the ever-changing market, which requires a lot of experimentation to get right. “You can’t specify everything, but you do need a plan”. “You need to fail fast and recover”. “You will never be done” – building and then collecting money is a pipe dream; product \u0026 marketing require continuous investment to remain successful. “Don’t expect instant gratification” – product/marketing/reputation require time and effort, and it’s way harder the first time. The real work begins after you launch. “Process is king” – having documented repeatable processes is key to delegation, bringing on partners, and avoiding mistakes. Such documentation makes it easier to sell the business if you want. “Nothing about a startup is a one-time effort” – getting to the point of an automated startup requires a wise choice of niche \u0026 product, as well as investment in outsourcing and automation. Things like marketing remain hard to outsource, though. Key quotes:\n“A developer who knows how to market a product is a rare (and powerful) combination.” “Marketing is more important than your product. […] Product Last. Marketing First.” “If you’re a venture-backed startup founder you’re looking at many years of long hours with a small potential for a huge payoff. […] If you’re a self-funded startup founder, you’re looking at a decent potential for a decent payoff.” “Without a market, a software application is just a project.” My thoughts:\nI’m surprised by the length of my summary! I thought it’d just be a couple of quotes, especially given that many of the specific examples and references haven’t aged well. But a lot of the key principles are still relevant today. It’s somewhat ironic that reading business books is described as non-productive given that Start Small, Stay Small is a business book, but I suppose that’s qualified by the later statement that information consumption is worthwhile when it leads to productive action notes. As I enjoy reading \u0026 learning, I can definitely relate to the sense of pseudo-productivity when going down information rabbit holes. Further, since the book was published, the number of ways to get distracted has kept increasing while the number of hours in a day hasn’t changed, so remaining focused is perhaps more of a challenge these days. Walling talks about not being able to get rich through salaried work (in the context of the desire to get rich being a wrong reason to start), but I disagree. It’s well-known that many tech employees earn well, even outside the big tech companies (where total compensation can be in the high six figures or even in the seven figures). Working consistently for a salary and keeping expenses under control is a safe way to get rich, but it can be hard (see the FIRE movement). In my case, I would have been richer now if I had joined Google after my PhD in 2012 (I interned there and chose to work with small startups instead), or if I had not done a PhD and stayed in Israel to work with big tech companies in 2009 (tech compensation in Israel is higher than in Australia), or if I had stayed with Automattic a couple of years ago and kept working full time. But life isn’t only about maximising material wealth – I’m happy with my choices. There are multiple references to long nights, which I assume mostly apply to people who work on a side-business in addition to a full-time job. I suppose there’s no avoiding some unpleasant work at inconvenient times, but applying time discipline is important, especially if control over how you spend your time is a motivator for going down the micropreneurship path. One segment that feels dated in 2023 discusses examples of tasks that can be easily outsourced, like one-off scraping of images from a website and making some CSS tweaks. These days, it’s cheaper \u0026 faster to prompt ChatGPT or one of its cousins to get such tasks done. Echoing Michael Lynch’s review of the book, I also have my doubts about Walling’s advice on outsourcing, but there are definitely tasks that can and should be outsourced. I remembered the book as being too militant about dollarising time. The simple fact is that time can’t be saved like money (e.g., when you die your heirs don’t get to enjoy all the time you’ve saved). However, at least in chapter 1, it’s clear that dollarising time refers to work time – leisure time is a different story. It makes perfect sense to be diligent about how work time is spent and aim to at least match the market value of your time when running a for-profit business. Still, it’s hard to put a price on joy and purpose found in work – many people are happy to take a pay cut for more fulfilling work. Fulfillment isn’t captured by the crude metric of dollars earned per hour worked. Nonetheless, at least being aware of your effective hourly rate as a micropreneur seems like a good guardrail. To dollarise time, you need to know how many hours you work. But the reality is that we use the same brain for work and outside work, and total control over thoughts is impossible. I doubt it’s even desirable, as many good ideas come when we’re not “at work”. Still, it’s worth striving for some separation between work and non-work time, especially given the ubiquity of internet connections and mobile devices. On this read, I found the paragraphs that talked about boredom with keeping up with tech especially relatable. While there have been transformative changes in tech over the past decade, anything useful quickly becomes a software commodity – less exciting from a technical perspective. Everyday tech work often requires putting the right lego pieces together and dealing with human problems. While I still find some satisfaction in the technical aspects of building software and exploring data, I’m less interested in tech for tech’s sake these days. I abandoned my last micropreneurship attempt because I came to the same realisation around the real work beginning after launch. I just didn’t care enough about my online price comparison product to put in months and years of effort into marketing. This is partly what’s missing from chapter 1 (though I suppose it’s somewhat covered by setting goals): For some people, aligning business value with personal values is key to putting in consistent effort. It’s easy to give up when there’s more money and less stress in salaried work, and you don’t care about the product you’ve built. If I decide to follow the micropreneur path again, one of my aims is to have better value alignment than last time. Key action item from this chapter: Think deeply on goals and commit to them.\n","wordCount":"1998","inLanguage":"en","datePublished":"2023-08-16T05:45:00Z","dateModified":"2024-03-12T16:33:31+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Revisiting Start Small, Stay Small in 2023 (Chapter 1)</h1><div class=post-meta><span title='2023-08-16 05:45:00 +0000 UTC'>August 16, 2023</span></div></header><div class=post-content><p>I first read <a href=https://startsmall.com/ target=_blank rel=noopener><em>Start Small, Stay Small</em></a> by Rob Walling in 2014, as I was working on <a href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/>a self-funded lifestyle business idea</a> (aka micropreneurship). I ultimately abandoned the product I was working on in favour of salaried work, but now I&rsquo;m considering new ideas, especially in the climate and nature-positive space.</p><p>As I spent the intervening years as an employee with <a href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/>VC-funded startups</a> and with <a href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/>Automattic</a>, my micropreneurship skills are much weaker than they could have been if I had stuck to building & selling products independently. Still, salaried work has its perks – I&rsquo;m now in a more financially secure position than I was in 2014.</p><p>As part of getting back into micropreneurship, I figured it&rsquo;d be worth rereading / skimming <em>Start Small, Stay Small</em>. This time, I&rsquo;m using my TIL format to post some notes for my own reference.</p><p>I originally thought this would be a single post on the entire book, but my summary and thoughts on Chapter 1 (<em>The Chasm Between Developer and Entrepreneur</em>) ended up on the lengthy side. My goal with the TIL format is to publish more frequently, so this is good enough for a standalone post.</p><p><strong>Summary:</strong></p><ul><li>The book is focused on micropreneurs (solo founders who want to remain solo) and bootstrappers (those who want to grow a business without taking external funding) – both follow a similar process.</li><li>Seeking funding takes a lot of time and focus, and limits the markets you can pursue. It&rsquo;s harder to justify aiming for niche markets and moderate success when you&rsquo;re funded by venture capitalists who expect a substantial return on investment.</li><li>Defining a self-funded startup entrepreneur:<ol><li><em>&ldquo;technical visionary who creates software for a niche market&rdquo;</em></li><li><em>&ldquo;merges existing technical knowledge with online marketing knowledge&rdquo;</em></li><li><em>&ldquo;a cross between a developer, a webmaster, and a marketer&rdquo;</em></li></ol></li><li><em>Wrong</em> reasons to start:<ol><li><em>&ldquo;having a product idea&rdquo;</em> – <em>&ldquo;without a market, a software application is just a project&rdquo;</em></li><li><em>&ldquo;to get rich&rdquo;</em></li><li><em>&ldquo;because it sounds like fun&rdquo;</em></li></ol></li><li>The <em>right</em> reasons to start depend on your goals. Micropreneurs lean towards lifestyle choices (freedom & income/location-independence). Bootstrappers might lean more towards the challenge and excitement of ownership and control. It&rsquo;s worth spending time clarifying goals. Communicating them publicly and creating an accountability system is helpful in following through.</li><li>Suggested goal: Build a startup that generates a monthly profit of $500. It&rsquo;s harder than it may sound.</li><li>Goals are key to getting through &ldquo;the dip&rdquo;: The point where the work is so hard (e.g., high volume of support) it can become unbearable.</li><li>Reasons why people switch from development to entrepreneurship: lack of learning (keeping up with new tech becomes less exciting) and wanting more ownership (when working for a salary, little equity is retained if you leave).</li><li>Roadblocks to success and how to avoid them:<ol><li><em>&ldquo;No market&rdquo;</em> – <em>&ldquo;building something no one wants&rdquo;</em>. Avoided by verifying there&rsquo;s a market before building the product.</li><li><em>&ldquo;Fear&rdquo;</em>. Can&rsquo;t be fully avoided, but this may help: <em>&ldquo;The up-front fear is a big indicator that you&rsquo;re going to grow as a person if you proceed through it. And, frankly, the terror wears off pretty quickly.&rdquo;</em></li><li><em>&ldquo;Lack of goals&rdquo;</em>, e.g., around profit growth and lifestyle. Avoided by defining your goals and writing them down.</li><li><em>&ldquo;Inconsistency&rdquo;</em>, doing pseudo-productive things such as reading business books – can&rsquo;t consume information and produce at the same time. Avoided by setting limits on content consumption – asking yourself whether pseudo-productive activities are actually worth it.</li><li><em>&ldquo;Believing you have to do everything yourself&rdquo;</em>. Avoided by getting comfortable with outsourcing <em>the right tasks</em> to contractors and virtual assistants (e.g., probably outsource graphic design but don&rsquo;t outsource the product architecture).</li></ol></li><li>Putting a dollar value on your work hours (i.e., dollarising your time) makes outsourcing decisions easier. It&rsquo;s a step many entrepreneurs skip. This results in them performing menial tasks that can be outsourced, with an effective hourly rate that&rsquo;s around the minimum wage or lower.</li><li>Approaches to setting your current dollar value: (1) use freelancer rates; or (2) divide total compensation (including benefits) by work hours. Then set a desired rate. Don&rsquo;t accept making something like $25 / hour. Make your target rate a reality as soon as possible, then increase it.</li><li>Realisations that come from dollarising your time:<ol><li><em>&ldquo;Outsourcing is a bargain&rdquo;</em>.</li><li><em>&ldquo;Keep work and play separate&rdquo;</em> – <em>&ldquo;work hard and play hard, but never do both at once&rdquo;</em>. Don&rsquo;t do things like playing with your kids while working on your iPhone, as you&rsquo;ll be doing both poorly.</li><li><em>&ldquo;Wasting time is bad&rdquo;</em> – unproductive non-leisure activities are wasted money.</li><li><em>&ldquo;Information consumption is only good when it produces something&rdquo;</em> (excluding consumption for leisure). Recommendation: <em>&ldquo;When reading blogs or books or listening to podcasts or audio books, take action notes.&rdquo;</em> If no actions arise, it may be that the content is low value.</li></ol></li><li>Realisations that come when transitioning from developer to entrepreneur:<ol><li><em>&ldquo;Being a good technician is not enough&rdquo;</em>. It&rsquo;s critical to do management work like thinking about return on investment and productivity, and visionary/creative work around the long-term direction of the business. This is a key component in escaping the $25 / hour pit.</li><li><em>&ldquo;Market comes first, marketing second, aesthetic third, and functionality a distant fourth&rdquo;</em>.</li><li><em>&ldquo;Things will never be as clear as you want them to be&rdquo;</em> – writing code is straightforward in comparison to the ever-changing market, which requires a lot of experimentation to get right.</li><li><em>&ldquo;You can&rsquo;t specify everything, but you do need a plan&rdquo;</em>.</li><li><em>&ldquo;You need to fail fast and recover&rdquo;</em>.</li><li><em>&ldquo;You will never be done&rdquo;</em> – building and then collecting money is a pipe dream; product & marketing require continuous investment to remain successful.</li><li><em>&ldquo;Don&rsquo;t expect instant gratification&rdquo;</em> – product/marketing/reputation require time and effort, and it&rsquo;s way harder the first time. The real work begins after you launch.</li><li><em>&ldquo;Process is king&rdquo;</em> – having documented repeatable processes is key to delegation, bringing on partners, and avoiding mistakes. Such documentation makes it easier to sell the business if you want.</li><li><em>&ldquo;Nothing about a startup is a one-time effort&rdquo;</em> – getting to the point of an automated startup requires a wise choice of niche & product, as well as investment in outsourcing and automation. Things like marketing remain hard to outsource, though.</li></ol></li></ul><p><strong>Key quotes:</strong></p><ul><li><em>&ldquo;A developer who knows how to market a product is a rare (and powerful) combination.&rdquo;</em></li><li><em>&ldquo;Marketing is more important than your product. [&mldr;] Product Last. Marketing First.&rdquo;</em></li><li><em>&ldquo;If you&rsquo;re a venture-backed startup founder you&rsquo;re looking at many years of long hours with a small potential for a huge payoff. [&mldr;] If you&rsquo;re a self-funded startup founder, you&rsquo;re looking at a decent potential for a decent payoff.&rdquo;</em></li><li><em>&ldquo;Without a market, a software application is just a project.&rdquo;</em></li></ul><p><strong>My thoughts:</strong></p><ul><li>I&rsquo;m surprised by the length of my summary! I thought it&rsquo;d just be a couple of quotes, especially given that many of the specific examples and references haven&rsquo;t aged well. But a lot of the key principles are still relevant today.</li><li>It&rsquo;s somewhat ironic that reading business books is described as non-productive given that <em>Start Small, Stay Small</em> is a business book, but I suppose that&rsquo;s qualified by the later statement that information consumption is worthwhile when it leads to productive action notes. As I enjoy reading & learning, I can definitely relate to the sense of pseudo-productivity when going down information rabbit holes. Further, since the book was published, the number of ways to get distracted has kept increasing while the number of hours in a day hasn&rsquo;t changed, so remaining focused is perhaps more of a challenge these days.</li><li>Walling talks about not being able to get rich through salaried work (in the context of the desire to get rich being a wrong reason to start), but I disagree. It&rsquo;s well-known that many tech employees earn well, even outside the big tech companies (where total compensation can be in the high six figures or even in the seven figures). Working consistently for a salary and keeping expenses under control is a safe way to get rich, but it can be hard (see <a href=https://en.wikipedia.org/wiki/FIRE_movement target=_blank rel=noopener>the FIRE movement</a>). In my case, I would have been richer now if I had joined Google after my PhD in 2012 (I interned there and chose to work with small startups instead), or if I had not done a PhD and stayed in Israel to work with big tech companies in 2009 (tech compensation in Israel is higher than in Australia), or if I had stayed with Automattic a couple of years ago and kept working full time. But life isn&rsquo;t only about maximising material wealth – I&rsquo;m happy with my choices.</li><li>There are multiple references to long nights, which I assume mostly apply to people who work on a side-business in addition to a full-time job. I suppose there&rsquo;s no avoiding some unpleasant work at inconvenient times, but applying time discipline is important, especially if control over how you spend your time is a motivator for going down the micropreneurship path.</li><li>One segment that feels dated in 2023 discusses examples of tasks that can be easily outsourced, like one-off scraping of images from a website and making some CSS tweaks. These days, it&rsquo;s cheaper & faster to prompt ChatGPT or one of its cousins to get such tasks done. Echoing <a href=https://mtlynch.io/book-reports/start-small-stay-small/ target=_blank rel=noopener>Michael Lynch&rsquo;s review of the book</a>, I also have my doubts about Walling&rsquo;s advice on outsourcing, but there are definitely tasks that can and should be outsourced.</li><li>I remembered the book as being too militant about dollarising time. The simple fact is that <a href=https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/>time can&rsquo;t be saved</a> like money (e.g., when you die your heirs don&rsquo;t get to enjoy all the time you&rsquo;ve saved). However, at least in chapter 1, it&rsquo;s clear that dollarising time refers to <em>work</em> time – leisure time is a different story. It makes perfect sense to be diligent about how <em>work</em> time is spent and aim to at least match the market value of your time when running a for-profit business. Still, it&rsquo;s hard to put a price on <a href=https://longform.asmartbear.com/fulfillment/ target=_blank rel=noopener>joy and purpose found in work</a> – many people are happy to take a pay cut for more fulfilling work. Fulfillment isn&rsquo;t captured by the crude metric of dollars earned per hour worked. Nonetheless, at least being aware of your effective hourly rate as a micropreneur seems like a good guardrail.</li><li>To dollarise time, you need to know how many hours you work. But the reality is that we use the same brain for work and outside work, and total control over thoughts is impossible. I doubt it&rsquo;s even desirable, as many good ideas come when we&rsquo;re not &ldquo;at work&rdquo;. Still, it&rsquo;s worth striving for some separation between work and non-work time, especially given the ubiquity of internet connections and mobile devices.</li><li>On this read, I found the paragraphs that talked about boredom with keeping up with tech especially relatable. While there have been transformative changes in tech over the past decade, <a href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/>anything useful quickly becomes a software commodity</a> – less exciting from a technical perspective. Everyday tech work often requires putting the right lego pieces together and dealing with human problems. While I still find some satisfaction in the technical aspects of building software and exploring data, I&rsquo;m less interested in tech for tech&rsquo;s sake these days.</li><li>I abandoned my last micropreneurship attempt because I came to the same realisation around the real work beginning after launch. I just didn&rsquo;t care enough about my online price comparison product to put in months and years of effort into marketing. This is partly what&rsquo;s missing from chapter 1 (though I suppose it&rsquo;s somewhat covered by setting goals): For some people, aligning business value with personal values is key to putting in consistent effort. It&rsquo;s easy to give up when there&rsquo;s more money and less stress in salaried work, and you don&rsquo;t care about the product you&rsquo;ve built. If I decide to follow the micropreneur path again, one of my aims is to have better value alignment than last time.</li></ul><p><strong>Key action item from this chapter:</strong> Think deeply on goals and commit to them.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li><li><a href=https://yanirseroussi.com/tags/productivity/>Productivity</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 1) on x" href="https://x.com/intent/tweet/?text=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%201%29&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f16%2frevisiting-start-small-stay-small-in-2023-chapter-1%2f&amp;hashtags=books%2cbusiness%2ccareer%2cmarketing%2cpersonal%2cproductivity%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 1) on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f16%2frevisiting-start-small-stay-small-in-2023-chapter-1%2f&amp;title=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%201%29&amp;summary=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%201%29&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f16%2frevisiting-start-small-stay-small-in-2023-chapter-1%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 1) on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f16%2frevisiting-start-small-stay-small-in-2023-chapter-1%2f&title=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%201%29"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 1) on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f16%2frevisiting-start-small-stay-small-in-2023-chapter-1%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 1) on whatsapp" href="https://api.whatsapp.com/send?text=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%201%29%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f16%2frevisiting-start-small-stay-small-in-2023-chapter-1%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 1) on telegram" href="https://telegram.me/share/url?text=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%201%29&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f16%2frevisiting-start-small-stay-small-in-2023-chapter-1%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 1) on ycombinator" href="https://news.ycombinator.com/submitlink?t=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%201%29&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f16%2frevisiting-start-small-stay-small-in-2023-chapter-1%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="books,business,career,marketing,personal,productivity,quotes"><meta name=description content="A summary of the first chapter of Rob Walling&rsquo;s Start Small, Stay Small, along with my thoughts & reflections."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Revisiting Start Small, Stay Small in 2023 (Chapter 1)"><meta property="og:description" content="A summary of the first chapter of Rob Walling&rsquo;s Start Small, Stay Small, along with my thoughts & reflections."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-08-16T05:45:00+00:00"><meta property="article:modified_time" content="2024-03-12T16:33:31+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Revisiting Start Small, Stay Small in 2023 (Chapter 1)"><meta name=twitter:description content="A summary of the first chapter of Rob Walling&rsquo;s Start Small, Stay Small, along with my thoughts & reflections."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Revisiting Start Small, Stay Small in 2023 (Chapter 1)","item":"https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Revisiting Start Small, Stay Small in 2023 (Chapter 1)","name":"Revisiting Start Small, Stay Small in 2023 (Chapter 1)","description":"A summary of the first chapter of Rob Walling\u0026rsquo;s Start Small, Stay Small, along with my thoughts \u0026amp; reflections.","keywords":["books","business","career","marketing","personal","productivity","quotes"],"articleBody":"I first read Start Small, Stay Small by Rob Walling in 2014, as I was working on a self-funded lifestyle business idea (aka micropreneurship). I ultimately abandoned the product I was working on in favour of salaried work, but now I’m considering new ideas, especially in the climate and nature-positive space.\nAs I spent the intervening years as an employee with VC-funded startups and with Automattic, my micropreneurship skills are much weaker than they could have been if I had stuck to building \u0026 selling products independently. Still, salaried work has its perks – I’m now in a more financially secure position than I was in 2014.\nAs part of getting back into micropreneurship, I figured it’d be worth rereading / skimming Start Small, Stay Small. This time, I’m using my TIL format to post some notes for my own reference.\nI originally thought this would be a single post on the entire book, but my summary and thoughts on Chapter 1 (The Chasm Between Developer and Entrepreneur) ended up on the lengthy side. My goal with the TIL format is to publish more frequently, so this is good enough for a standalone post.\nSummary:\nThe book is focused on micropreneurs (solo founders who want to remain solo) and bootstrappers (those who want to grow a business without taking external funding) – both follow a similar process. Seeking funding takes a lot of time and focus, and limits the markets you can pursue. It’s harder to justify aiming for niche markets and moderate success when you’re funded by venture capitalists who expect a substantial return on investment. Defining a self-funded startup entrepreneur: “technical visionary who creates software for a niche market” “merges existing technical knowledge with online marketing knowledge” “a cross between a developer, a webmaster, and a marketer” Wrong reasons to start: “having a product idea” – “without a market, a software application is just a project” “to get rich” “because it sounds like fun” The right reasons to start depend on your goals. Micropreneurs lean towards lifestyle choices (freedom \u0026 income/location-independence). Bootstrappers might lean more towards the challenge and excitement of ownership and control. It’s worth spending time clarifying goals. Communicating them publicly and creating an accountability system is helpful in following through. Suggested goal: Build a startup that generates a monthly profit of $500. It’s harder than it may sound. Goals are key to getting through “the dip”: The point where the work is so hard (e.g., high volume of support) it can become unbearable. Reasons why people switch from development to entrepreneurship: lack of learning (keeping up with new tech becomes less exciting) and wanting more ownership (when working for a salary, little equity is retained if you leave). Roadblocks to success and how to avoid them: “No market” – “building something no one wants”. Avoided by verifying there’s a market before building the product. “Fear”. Can’t be fully avoided, but this may help: “The up-front fear is a big indicator that you’re going to grow as a person if you proceed through it. And, frankly, the terror wears off pretty quickly.” “Lack of goals”, e.g., around profit growth and lifestyle. Avoided by defining your goals and writing them down. “Inconsistency”, doing pseudo-productive things such as reading business books – can’t consume information and produce at the same time. Avoided by setting limits on content consumption – asking yourself whether pseudo-productive activities are actually worth it. “Believing you have to do everything yourself”. Avoided by getting comfortable with outsourcing the right tasks to contractors and virtual assistants (e.g., probably outsource graphic design but don’t outsource the product architecture). Putting a dollar value on your work hours (i.e., dollarising your time) makes outsourcing decisions easier. It’s a step many entrepreneurs skip. This results in them performing menial tasks that can be outsourced, with an effective hourly rate that’s around the minimum wage or lower. Approaches to setting your current dollar value: (1) use freelancer rates; or (2) divide total compensation (including benefits) by work hours. Then set a desired rate. Don’t accept making something like $25 / hour. Make your target rate a reality as soon as possible, then increase it. Realisations that come from dollarising your time: “Outsourcing is a bargain”. “Keep work and play separate” – “work hard and play hard, but never do both at once”. Don’t do things like playing with your kids while working on your iPhone, as you’ll be doing both poorly. “Wasting time is bad” – unproductive non-leisure activities are wasted money. “Information consumption is only good when it produces something” (excluding consumption for leisure). Recommendation: “When reading blogs or books or listening to podcasts or audio books, take action notes.” If no actions arise, it may be that the content is low value. Realisations that come when transitioning from developer to entrepreneur: “Being a good technician is not enough”. It’s critical to do management work like thinking about return on investment and productivity, and visionary/creative work around the long-term direction of the business. This is a key component in escaping the $25 / hour pit. “Market comes first, marketing second, aesthetic third, and functionality a distant fourth”. “Things will never be as clear as you want them to be” – writing code is straightforward in comparison to the ever-changing market, which requires a lot of experimentation to get right. “You can’t specify everything, but you do need a plan”. “You need to fail fast and recover”. “You will never be done” – building and then collecting money is a pipe dream; product \u0026 marketing require continuous investment to remain successful. “Don’t expect instant gratification” – product/marketing/reputation require time and effort, and it’s way harder the first time. The real work begins after you launch. “Process is king” – having documented repeatable processes is key to delegation, bringing on partners, and avoiding mistakes. Such documentation makes it easier to sell the business if you want. “Nothing about a startup is a one-time effort” – getting to the point of an automated startup requires a wise choice of niche \u0026 product, as well as investment in outsourcing and automation. Things like marketing remain hard to outsource, though. Key quotes:\n“A developer who knows how to market a product is a rare (and powerful) combination.” “Marketing is more important than your product. […] Product Last. Marketing First.” “If you’re a venture-backed startup founder you’re looking at many years of long hours with a small potential for a huge payoff. […] If you’re a self-funded startup founder, you’re looking at a decent potential for a decent payoff.” “Without a market, a software application is just a project.” My thoughts:\nI’m surprised by the length of my summary! I thought it’d just be a couple of quotes, especially given that many of the specific examples and references haven’t aged well. But a lot of the key principles are still relevant today. It’s somewhat ironic that reading business books is described as non-productive given that Start Small, Stay Small is a business book, but I suppose that’s qualified by the later statement that information consumption is worthwhile when it leads to productive action notes. As I enjoy reading \u0026 learning, I can definitely relate to the sense of pseudo-productivity when going down information rabbit holes. Further, since the book was published, the number of ways to get distracted has kept increasing while the number of hours in a day hasn’t changed, so remaining focused is perhaps more of a challenge these days. Walling talks about not being able to get rich through salaried work (in the context of the desire to get rich being a wrong reason to start), but I disagree. It’s well-known that many tech employees earn well, even outside the big tech companies (where total compensation can be in the high six figures or even in the seven figures). Working consistently for a salary and keeping expenses under control is a safe way to get rich, but it can be hard (see the FIRE movement). In my case, I would have been richer now if I had joined Google after my PhD in 2012 (I interned there and chose to work with small startups instead), or if I had not done a PhD and stayed in Israel to work with big tech companies in 2009 (tech compensation in Israel is higher than in Australia), or if I had stayed with Automattic a couple of years ago and kept working full time. But life isn’t only about maximising material wealth – I’m happy with my choices. There are multiple references to long nights, which I assume mostly apply to people who work on a side-business in addition to a full-time job. I suppose there’s no avoiding some unpleasant work at inconvenient times, but applying time discipline is important, especially if control over how you spend your time is a motivator for going down the micropreneurship path. One segment that feels dated in 2023 discusses examples of tasks that can be easily outsourced, like one-off scraping of images from a website and making some CSS tweaks. These days, it’s cheaper \u0026 faster to prompt ChatGPT or one of its cousins to get such tasks done. Echoing Michael Lynch’s review of the book, I also have my doubts about Walling’s advice on outsourcing, but there are definitely tasks that can and should be outsourced. I remembered the book as being too militant about dollarising time. The simple fact is that time can’t be saved like money (e.g., when you die your heirs don’t get to enjoy all the time you’ve saved). However, at least in chapter 1, it’s clear that dollarising time refers to work time – leisure time is a different story. It makes perfect sense to be diligent about how work time is spent and aim to at least match the market value of your time when running a for-profit business. Still, it’s hard to put a price on joy and purpose found in work – many people are happy to take a pay cut for more fulfilling work. Fulfillment isn’t captured by the crude metric of dollars earned per hour worked. Nonetheless, at least being aware of your effective hourly rate as a micropreneur seems like a good guardrail. To dollarise time, you need to know how many hours you work. But the reality is that we use the same brain for work and outside work, and total control over thoughts is impossible. I doubt it’s even desirable, as many good ideas come when we’re not “at work”. Still, it’s worth striving for some separation between work and non-work time, especially given the ubiquity of internet connections and mobile devices. On this read, I found the paragraphs that talked about boredom with keeping up with tech especially relatable. While there have been transformative changes in tech over the past decade, anything useful quickly becomes a software commodity – less exciting from a technical perspective. Everyday tech work often requires putting the right lego pieces together and dealing with human problems. While I still find some satisfaction in the technical aspects of building software and exploring data, I’m less interested in tech for tech’s sake these days. I abandoned my last micropreneurship attempt because I came to the same realisation around the real work beginning after launch. I just didn’t care enough about my online price comparison product to put in months and years of effort into marketing. This is partly what’s missing from chapter 1 (though I suppose it’s somewhat covered by setting goals): For some people, aligning business value with personal values is key to putting in consistent effort. It’s easy to give up when there’s more money and less stress in salaried work, and you don’t care about the product you’ve built. If I decide to follow the micropreneur path again, one of my aims is to have better value alignment than last time. Key action item from this chapter: Think deeply on goals and commit to them.\n","wordCount":"1998","inLanguage":"en","datePublished":"2023-08-16T05:45:00Z","dateModified":"2024-03-12T16:33:31+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Revisiting Start Small, Stay Small in 2023 (Chapter 1)</h1><div class=post-meta><span title='2023-08-16 05:45:00 +0000 UTC'>August 16, 2023</span></div></header><div class=post-content><p>I first read <a href=https://startsmall.com/ target=_blank rel=noopener><em>Start Small, Stay Small</em></a> by Rob Walling in 2014, as I was working on <a href=https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/>a self-funded lifestyle business idea</a> (aka micropreneurship). I ultimately abandoned the product I was working on in favour of salaried work, but now I&rsquo;m considering new ideas, especially in the climate and nature-positive space.</p><p>As I spent the intervening years as an employee with <a href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/>VC-funded startups</a> and with <a href=https://yanirseroussi.com/2021/10/07/my-work-with-automattic/>Automattic</a>, my micropreneurship skills are much weaker than they could have been if I had stuck to building & selling products independently. Still, salaried work has its perks – I&rsquo;m now in a more financially secure position than I was in 2014.</p><p>As part of getting back into micropreneurship, I figured it&rsquo;d be worth rereading / skimming <em>Start Small, Stay Small</em>. This time, I&rsquo;m using my TIL format to post some notes for my own reference.</p><p>I originally thought this would be a single post on the entire book, but my summary and thoughts on Chapter 1 (<em>The Chasm Between Developer and Entrepreneur</em>) ended up on the lengthy side. My goal with the TIL format is to publish more frequently, so this is good enough for a standalone post.</p><p><strong>Summary:</strong></p><ul><li>The book is focused on micropreneurs (solo founders who want to remain solo) and bootstrappers (those who want to grow a business without taking external funding) – both follow a similar process.</li><li>Seeking funding takes a lot of time and focus, and limits the markets you can pursue. It&rsquo;s harder to justify aiming for niche markets and moderate success when you&rsquo;re funded by venture capitalists who expect a substantial return on investment.</li><li>Defining a self-funded startup entrepreneur:<ol><li><em>&ldquo;technical visionary who creates software for a niche market&rdquo;</em></li><li><em>&ldquo;merges existing technical knowledge with online marketing knowledge&rdquo;</em></li><li><em>&ldquo;a cross between a developer, a webmaster, and a marketer&rdquo;</em></li></ol></li><li><em>Wrong</em> reasons to start:<ol><li><em>&ldquo;having a product idea&rdquo;</em> – <em>&ldquo;without a market, a software application is just a project&rdquo;</em></li><li><em>&ldquo;to get rich&rdquo;</em></li><li><em>&ldquo;because it sounds like fun&rdquo;</em></li></ol></li><li>The <em>right</em> reasons to start depend on your goals. Micropreneurs lean towards lifestyle choices (freedom & income/location-independence). Bootstrappers might lean more towards the challenge and excitement of ownership and control. It&rsquo;s worth spending time clarifying goals. Communicating them publicly and creating an accountability system is helpful in following through.</li><li>Suggested goal: Build a startup that generates a monthly profit of $500. It&rsquo;s harder than it may sound.</li><li>Goals are key to getting through &ldquo;the dip&rdquo;: The point where the work is so hard (e.g., high volume of support) it can become unbearable.</li><li>Reasons why people switch from development to entrepreneurship: lack of learning (keeping up with new tech becomes less exciting) and wanting more ownership (when working for a salary, little equity is retained if you leave).</li><li>Roadblocks to success and how to avoid them:<ol><li><em>&ldquo;No market&rdquo;</em> – <em>&ldquo;building something no one wants&rdquo;</em>. Avoided by verifying there&rsquo;s a market before building the product.</li><li><em>&ldquo;Fear&rdquo;</em>. Can&rsquo;t be fully avoided, but this may help: <em>&ldquo;The up-front fear is a big indicator that you&rsquo;re going to grow as a person if you proceed through it. And, frankly, the terror wears off pretty quickly.&rdquo;</em></li><li><em>&ldquo;Lack of goals&rdquo;</em>, e.g., around profit growth and lifestyle. Avoided by defining your goals and writing them down.</li><li><em>&ldquo;Inconsistency&rdquo;</em>, doing pseudo-productive things such as reading business books – can&rsquo;t consume information and produce at the same time. Avoided by setting limits on content consumption – asking yourself whether pseudo-productive activities are actually worth it.</li><li><em>&ldquo;Believing you have to do everything yourself&rdquo;</em>. Avoided by getting comfortable with outsourcing <em>the right tasks</em> to contractors and virtual assistants (e.g., probably outsource graphic design but don&rsquo;t outsource the product architecture).</li></ol></li><li>Putting a dollar value on your work hours (i.e., dollarising your time) makes outsourcing decisions easier. It&rsquo;s a step many entrepreneurs skip. This results in them performing menial tasks that can be outsourced, with an effective hourly rate that&rsquo;s around the minimum wage or lower.</li><li>Approaches to setting your current dollar value: (1) use freelancer rates; or (2) divide total compensation (including benefits) by work hours. Then set a desired rate. Don&rsquo;t accept making something like $25 / hour. Make your target rate a reality as soon as possible, then increase it.</li><li>Realisations that come from dollarising your time:<ol><li><em>&ldquo;Outsourcing is a bargain&rdquo;</em>.</li><li><em>&ldquo;Keep work and play separate&rdquo;</em> – <em>&ldquo;work hard and play hard, but never do both at once&rdquo;</em>. Don&rsquo;t do things like playing with your kids while working on your iPhone, as you&rsquo;ll be doing both poorly.</li><li><em>&ldquo;Wasting time is bad&rdquo;</em> – unproductive non-leisure activities are wasted money.</li><li><em>&ldquo;Information consumption is only good when it produces something&rdquo;</em> (excluding consumption for leisure). Recommendation: <em>&ldquo;When reading blogs or books or listening to podcasts or audio books, take action notes.&rdquo;</em> If no actions arise, it may be that the content is low value.</li></ol></li><li>Realisations that come when transitioning from developer to entrepreneur:<ol><li><em>&ldquo;Being a good technician is not enough&rdquo;</em>. It&rsquo;s critical to do management work like thinking about return on investment and productivity, and visionary/creative work around the long-term direction of the business. This is a key component in escaping the $25 / hour pit.</li><li><em>&ldquo;Market comes first, marketing second, aesthetic third, and functionality a distant fourth&rdquo;</em>.</li><li><em>&ldquo;Things will never be as clear as you want them to be&rdquo;</em> – writing code is straightforward in comparison to the ever-changing market, which requires a lot of experimentation to get right.</li><li><em>&ldquo;You can&rsquo;t specify everything, but you do need a plan&rdquo;</em>.</li><li><em>&ldquo;You need to fail fast and recover&rdquo;</em>.</li><li><em>&ldquo;You will never be done&rdquo;</em> – building and then collecting money is a pipe dream; product & marketing require continuous investment to remain successful.</li><li><em>&ldquo;Don&rsquo;t expect instant gratification&rdquo;</em> – product/marketing/reputation require time and effort, and it&rsquo;s way harder the first time. The real work begins after you launch.</li><li><em>&ldquo;Process is king&rdquo;</em> – having documented repeatable processes is key to delegation, bringing on partners, and avoiding mistakes. Such documentation makes it easier to sell the business if you want.</li><li><em>&ldquo;Nothing about a startup is a one-time effort&rdquo;</em> – getting to the point of an automated startup requires a wise choice of niche & product, as well as investment in outsourcing and automation. Things like marketing remain hard to outsource, though.</li></ol></li></ul><p><strong>Key quotes:</strong></p><ul><li><em>&ldquo;A developer who knows how to market a product is a rare (and powerful) combination.&rdquo;</em></li><li><em>&ldquo;Marketing is more important than your product. [&mldr;] Product Last. Marketing First.&rdquo;</em></li><li><em>&ldquo;If you&rsquo;re a venture-backed startup founder you&rsquo;re looking at many years of long hours with a small potential for a huge payoff. [&mldr;] If you&rsquo;re a self-funded startup founder, you&rsquo;re looking at a decent potential for a decent payoff.&rdquo;</em></li><li><em>&ldquo;Without a market, a software application is just a project.&rdquo;</em></li></ul><p><strong>My thoughts:</strong></p><ul><li>I&rsquo;m surprised by the length of my summary! I thought it&rsquo;d just be a couple of quotes, especially given that many of the specific examples and references haven&rsquo;t aged well. But a lot of the key principles are still relevant today.</li><li>It&rsquo;s somewhat ironic that reading business books is described as non-productive given that <em>Start Small, Stay Small</em> is a business book, but I suppose that&rsquo;s qualified by the later statement that information consumption is worthwhile when it leads to productive action notes. As I enjoy reading & learning, I can definitely relate to the sense of pseudo-productivity when going down information rabbit holes. Further, since the book was published, the number of ways to get distracted has kept increasing while the number of hours in a day hasn&rsquo;t changed, so remaining focused is perhaps more of a challenge these days.</li><li>Walling talks about not being able to get rich through salaried work (in the context of the desire to get rich being a wrong reason to start), but I disagree. It&rsquo;s well-known that many tech employees earn well, even outside the big tech companies (where total compensation can be in the high six figures or even in the seven figures). Working consistently for a salary and keeping expenses under control is a safe way to get rich, but it can be hard (see <a href=https://en.wikipedia.org/wiki/FIRE_movement target=_blank rel=noopener>the FIRE movement</a>). In my case, I would have been richer now if I had joined Google after my PhD in 2012 (I interned there and chose to work with small startups instead), or if I had not done a PhD and stayed in Israel to work with big tech companies in 2009 (tech compensation in Israel is higher than in Australia), or if I had stayed with Automattic a couple of years ago and kept working full time. But life isn&rsquo;t only about maximising material wealth – I&rsquo;m happy with my choices.</li><li>There are multiple references to long nights, which I assume mostly apply to people who work on a side-business in addition to a full-time job. I suppose there&rsquo;s no avoiding some unpleasant work at inconvenient times, but applying time discipline is important, especially if control over how you spend your time is a motivator for going down the micropreneurship path.</li><li>One segment that feels dated in 2023 discusses examples of tasks that can be easily outsourced, like one-off scraping of images from a website and making some CSS tweaks. These days, it&rsquo;s cheaper & faster to prompt ChatGPT or one of its cousins to get such tasks done. Echoing <a href=https://mtlynch.io/book-reports/start-small-stay-small/ target=_blank rel=noopener>Michael Lynch&rsquo;s review of the book</a>, I also have my doubts about Walling&rsquo;s advice on outsourcing, but there are definitely tasks that can and should be outsourced.</li><li>I remembered the book as being too militant about dollarising time. The simple fact is that <a href=https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/>time can&rsquo;t be saved</a> like money (e.g., when you die your heirs don&rsquo;t get to enjoy all the time you&rsquo;ve saved). However, at least in chapter 1, it&rsquo;s clear that dollarising time refers to <em>work</em> time – leisure time is a different story. It makes perfect sense to be diligent about how <em>work</em> time is spent and aim to at least match the market value of your time when running a for-profit business. Still, it&rsquo;s hard to put a price on <a href=https://longform.asmartbear.com/fulfillment/ target=_blank rel=noopener>joy and purpose found in work</a> – many people are happy to take a pay cut for more fulfilling work. Fulfillment isn&rsquo;t captured by the crude metric of dollars earned per hour worked. Nonetheless, at least being aware of your effective hourly rate as a micropreneur seems like a good guardrail.</li><li>To dollarise time, you need to know how many hours you work. But the reality is that we use the same brain for work and outside work, and total control over thoughts is impossible. I doubt it&rsquo;s even desirable, as many good ideas come when we&rsquo;re not &ldquo;at work&rdquo;. Still, it&rsquo;s worth striving for some separation between work and non-work time, especially given the ubiquity of internet connections and mobile devices.</li><li>On this read, I found the paragraphs that talked about boredom with keeping up with tech especially relatable. While there have been transformative changes in tech over the past decade, <a href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/>anything useful quickly becomes a software commodity</a> – less exciting from a technical perspective. Everyday tech work often requires putting the right lego pieces together and dealing with human problems. While I still find some satisfaction in the technical aspects of building software and exploring data, I&rsquo;m less interested in tech for tech&rsquo;s sake these days.</li><li>I abandoned my last micropreneurship attempt because I came to the same realisation around the real work beginning after launch. I just didn&rsquo;t care enough about my online price comparison product to put in months and years of effort into marketing. This is partly what&rsquo;s missing from chapter 1 (though I suppose it&rsquo;s somewhat covered by setting goals): For some people, aligning business value with personal values is key to putting in consistent effort. It&rsquo;s easy to give up when there&rsquo;s more money and less stress in salaried work, and you don&rsquo;t care about the product you&rsquo;ve built. If I decide to follow the micropreneur path again, one of my aims is to have better value alignment than last time.</li></ul><p><strong>Key action item from this chapter:</strong> Think deeply on goals and commit to them.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li><li><a href=https://yanirseroussi.com/tags/productivity/>Productivity</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 1) on x" href="https://x.com/intent/tweet/?text=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%201%29&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f16%2frevisiting-start-small-stay-small-in-2023-chapter-1%2f&amp;hashtags=books%2cbusiness%2ccareer%2cmarketing%2cpersonal%2cproductivity%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 1) on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f16%2frevisiting-start-small-stay-small-in-2023-chapter-1%2f&amp;title=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%201%29&amp;summary=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%201%29&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f16%2frevisiting-start-small-stay-small-in-2023-chapter-1%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 1) on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f16%2frevisiting-start-small-stay-small-in-2023-chapter-1%2f&title=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%201%29"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 1) on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f16%2frevisiting-start-small-stay-small-in-2023-chapter-1%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 1) on whatsapp" href="https://api.whatsapp.com/send?text=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%201%29%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f16%2frevisiting-start-small-stay-small-in-2023-chapter-1%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 1) on telegram" href="https://telegram.me/share/url?text=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%201%29&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f16%2frevisiting-start-small-stay-small-in-2023-chapter-1%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 1) on ycombinator" href="https://news.ycombinator.com/submitlink?t=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%201%29&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f16%2frevisiting-start-small-stay-small-in-2023-chapter-1%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/index.html b/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/index.html
index 49b4eecd2..7cda15958 100644
--- a/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/index.html
+++ b/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Revisiting Start Small, Stay Small in 2023 (Chapter 2) | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="books,business,career,marketing,personal,productivity,quotes"><meta name=description content="A summary of the second chapter of Rob Walling&rsquo;s Start Small, Stay Small, along with my thoughts & reflections."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Revisiting Start Small, Stay Small in 2023 (Chapter 2)"><meta property="og:description" content="A summary of the second chapter of Rob Walling&rsquo;s Start Small, Stay Small, along with my thoughts & reflections."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-08-17T07:45:00+00:00"><meta property="article:modified_time" content="2024-03-12T16:33:31+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Revisiting Start Small, Stay Small in 2023 (Chapter 2)"><meta name=twitter:description content="A summary of the second chapter of Rob Walling&rsquo;s Start Small, Stay Small, along with my thoughts & reflections."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Revisiting Start Small, Stay Small in 2023 (Chapter 2)","item":"https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Revisiting Start Small, Stay Small in 2023 (Chapter 2)","name":"Revisiting Start Small, Stay Small in 2023 (Chapter 2)","description":"A summary of the second chapter of Rob Walling\u0026rsquo;s Start Small, Stay Small, along with my thoughts \u0026amp; reflections.","keywords":["books","business","career","marketing","personal","productivity","quotes"],"articleBody":"Following my previous TIL post on Chapter 1 of Start Small, Stay Small by Rob Walling, this post covers Chapter 2: Why Niches Are the Name of the Game.\nSummary:\nReiterating the need to find a market before building a product, which is the opposite of many developers’ approach of building a product first. If you get lucky, you can find success with building the product first, but the odds aren’t in your favour. It’s a similar story for many VC-backed startups. Reasons why people still follow the VC-backed startup: The lottery factor of a massive success / cashing out, and potential personal popularity. If the reason is cashing out, it’s worth thinking what you’d do later. If the goal is to work on projects you enjoy, you can start now without taking VC funding. Reasons you must go niche: “A niche requires you to narrow your product focus”. For example, you can build the perfect product for a single person you know well. Expanding this to a group of people that you keep happy leads to guaranteed revenue. “Niche advertising is more cost-effective”. “Niches have less competition”. Big companies don’t bother with products for small markets, e.g., Microsoft wouldn’t build a product for a market that generates an annual revenue of $500,000. “Niches have higher profit margins”. This is a result of having less competition. “Niche markets are not used to good marketing”. “It’s easier for prospects to trust you”. This is because they’re more likely to hear about you multiple times (related to the cost-effectiveness of advertising). Warm niches exercise: Write up a table with two columns. Then fill it up with names of people you know (column 1) and their work experiences / hobbies (column 2). Common sentiment: All good niches are already taken. People starting out want an indication that the chosen niche is going to work, but it requires a leap of faith. Quoting previous chapter: “things will never be as clear as you want them to be”. Approaches to brainstorming niches (warm niches are best to increase your chances of success, but some cold niches can be made warm through networking): “Look at all areas of your life”. “Look at occupations”. “Cheat” by going through lists of existing ideas (the book includes many links such as A Startup A Day, where the last post is from March 2011). Evaluating a niche: Focus on consumers and/or small businesses as they have purchasing authority, make decisions fast, and search for online solutions (not the case for large companies and government agencies, for example). Check that the market is large enough. Rule of thumb: can place an ad in a specific magazine for less than $5,000 (though offline magazine advertising isn’t recommended, and the inexistence of a magazine doesn’t imply that the market isn’t viable). Can also check labour statistics for the number of practitioners in a field. Be wary of markets that don’t have dedicated websites or magazines, and where labour statistics claim less than 10,000 members. Check that there’s an inexpensive way to reach customers (typically online) – your niche is unlikely to be people who visit tech sites. The focus should be on where your customers are (e.g., a niche website). Top approaches for micropreneurs: building an audience and search engine optimisation. Secondary approaches include referral traffic, partnerships, article marketing, and cold calling. It’s hard to generate a sustainable stream of prospects without the top approaches. It’s better to focus on vertical markets (e.g., single industry or hobby) than on horizontal markets because members of a vertical have similar behaviours, talk to each other, hang out together, and have similar needs. This makes them easier to target, and increases the chances of organic product growth. Horizontal markets are rarely a good idea for micropreneurs because they’re too large and expensive to navigate. Measuring market demand without spending money: Obtain likely conversion rates for your price point. Discover likely traffic volumes through keyword research tools (considering search engine traffic, incoming links, direct traffic, and advertising). Example: 5,000 people search for inventory software each month. If you rank #1 for the term and get another 5,000 visitors from other sources, with a 0.5% conversion rate and a $200 product, you’d make $10,000 per month. Exercise: Take the top five hobbies and occupations from the warm niche brainstorming, ranked by personal interest. Ask the person you know about their problems to uncover software needs and ideas. Then use keyword research tools to see if there are other ideas that the person didn’t mention. Measure demand using free keyword research tools. There are also paid tools that can help, especially with assessing the difficulty of ranking for a term (many free tools are junk). As a rule of thumb, multiply by four the number of searches for generic terms like attorney billing software to get an estimate of traffic if you rank #1 for the generic term (due to traffic from long tail terms and other sources of traffic). Check the competition – regardless of what the tools say, if the current top result is well-optimised and has a high PageRank, it’s going to be tough to beat. Testing an idea for less than $100 (works for SaaS but not for products that rely on network effects): Choose the most interesting idea from your shortlist of product options (following niche \u0026 market demand research). Set up a mini sales site with 2-3 pages (homepage, pricing / signup, and possibly a product tour). Try to get people to click a “buy now” or “free trial” button (depending on the product cost). Create an AdWords (now Google Ads) campaign to generate traffic. Track clicks to estimate the conversion rate and link it to keywords. Notify users who are interested that the product is still under development. If you feel bad about misleading potential customers, include a “Coming Soon” note somewhere. Key quotes:\n“The product with a sizable market and low competition wins even with bad marketing, a bad aesthetic, and poor functionality.” “With luck on your side you don’t need money, good marketing or a solid product. You just need to be lucky.” “What matters is finding a group of people who need your something more than they need the money you’re charging for it.” “The best niches are reserved for people who do something.” “As a self-funded startup you want a market that is already looking for your product, even if it doesn’t exist. This is because creating demand is very, very expensive while filling existing demand is, by comparison, cheap.” “If your target market is not online, you have no chance of succeeding using the methodologies you’ll find in this book. This is non-negotiable.” “When you receive 50,000 visitors from one of the major media sites you will be lucky to convert five sales.” “Unfortunately, great products are often built and launched without a thought given to how the target audience will find out about it. You must have an inexpensive, ongoing source of new customers.” “With niche research the problem is not finding new ideas, but narrowing to the most effective strategies that you can implement in a reasonable amount of time.” “You only need to master two skills to sell online: human behavior and math.” My thoughts:\nDespite agreeing with the overall message of focusing on niche markets, I find myself thinking of counter-examples (e.g., of people who got lucky playing the startup or product-first games). I suppose it’s similar to the note towards the end of Thinking, Fast and Slow – being aware of biases isn’t enough to eliminate them. Base rate neglect is one such bias. Much of the discussion around market size and the ability to reach customers reminded me of Jason Cohen’s post on the difference between successfully solving a problem and having a viable business model. As the post came out in 2023, it’s likely that people are still making the same mistakes, which is related to it being hard to fight our biases. I don’t fully agree with Walling’s note on choosing the micropreneur path as a way of working on enjoyable projects. Starting a business isn’t a good way to guarantee work on things you enjoy, unless you enjoy everything that comes with running a business. That is, I don’t believe that running a VC-funded startup is that different from running a bootstrapped startup when it comes to enjoyment – there will always be unpleasant tasks. In line with my post on Chapter 1, I do believe that with a bootstrapped startup the founders have more control over aligning the work with their values – it’s hard to say no to investors who are essentially your bosses. Theoretically, with a bootstrapped startup it’s easier to say no to certain business activities and forgo the potential market share that comes with them, if pursuing such activities disagrees with your values – micropreneurs don’t have to pursue growth at all costs. The book includes outdated references to magazine \u0026 newspaper ads, but it’s easy to mentally translate the concepts to today’s tech. I suppose the equivalents today are niche sites / newsletters / online magazines / influencer channels where one can advertise. The underlying principles age slower than specific technologies and tools. A trap I hit the last time I read the book in 2014 is having a product idea and then insisting that a niche exists to match the product idea. It’s probably easier to start without a product idea, as we tend to fall in love with our ideas. Mentions of Web 2.0 as a shiny new thing bring back memories, and show how little has changed conceptually. The recent hype around blockchain and web3 looked a lot like solutions looking for problems to me. It’s hard to learn from the experience and advice of others. Between 2014 and now, I’ve seen multiple examples of failures that are due to not following (or being aware of) key advice from the book. Still, I’m somewhat swayed by misleading media noise and specific founder stories. I’m not in the marketing world, so I’m curious to what extent things have changed around specific tooling \u0026 approaches. Walling’s newer books might help (and there are a million and one others). But again, principles are key – just like in data science and software engineering where new tools often re-implement old ideas. Similarly, I’m curious to what extent website traffic is important these days, as websites aren’t the only way to reach people online (e.g., there are app stores and social media channels). However, thinking about impressions and conversion rates is important regardless of the medium through which they’re obtained. The specifics of market demand research seem outdated. It’s probably better to rely on newer focused resources for specifics, though people still use search engines. Free keyword research tools still exist in 2023 and are probably a good way to get started. Following the exact tips from the book too closely would be silly, though. A mini sales site makes perfect sense, and is probably still cheap to build and test. It may even be cheaper today given the abundance of templates and tools to build static websites. Further, static sites can be hosted for free, with the only cost being the domain name (that’s the case for my site). Key action items from this chapter:\nDo the warm niche exercise Find niches worth targeting (warmness is important, but alignment with interests and values is key) Evaluate niche market size Figure out how to reach promising niches Interview people in the promising niches Measure market demand for specific ideas Narrow down the list of ideas Build and market a mini sales site to gauge feasibility That’s plenty of work, so I probably won’t get to the next chapter for a while!\n","wordCount":"1967","inLanguage":"en","datePublished":"2023-08-17T07:45:00Z","dateModified":"2024-03-12T16:33:31+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Revisiting Start Small, Stay Small in 2023 (Chapter 2)</h1><div class=post-meta><span title='2023-08-17 07:45:00 +0000 UTC'>August 17, 2023</span></div></header><div class=post-content><p>Following <a href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/>my previous TIL post on Chapter 1</a> of <a href=https://startsmall.com/ target=_blank rel=noopener><em>Start Small, Stay Small</em></a> by Rob Walling, this post covers Chapter 2: <em>Why Niches Are the Name of the Game</em>.</p><p><strong>Summary:</strong></p><ul><li>Reiterating the need to find a market before building a product, which is the opposite of many developers&rsquo; approach of building a product first.</li><li>If you get lucky, you can find success with building the product first, but the odds aren&rsquo;t in your favour. It&rsquo;s a similar story for many VC-backed startups.</li><li>Reasons why people still follow the VC-backed startup: The lottery factor of a massive success / cashing out, and potential personal popularity. If the reason is cashing out, it&rsquo;s worth thinking what you&rsquo;d do later. If the goal is to work on projects you enjoy, you can start now without taking VC funding.</li><li>Reasons you must go niche:<ol><li><em>&ldquo;A niche requires you to narrow your product focus&rdquo;</em>. For example, you can build the perfect product for a single person you know well. Expanding this to a group of people that you keep happy leads to guaranteed revenue.</li><li><em>&ldquo;Niche advertising is more cost-effective&rdquo;</em>.</li><li><em>&ldquo;Niches have less competition&rdquo;</em>. Big companies don&rsquo;t bother with products for small markets, e.g., Microsoft wouldn&rsquo;t build a product for a market that generates an annual revenue of $500,000.</li><li><em>&ldquo;Niches have higher profit margins&rdquo;</em>. This is a result of having less competition.</li><li><em>&ldquo;Niche markets are not used to good marketing&rdquo;</em>.</li><li><em>&ldquo;It&rsquo;s easier for prospects to trust you&rdquo;</em>. This is because they&rsquo;re more likely to hear about you multiple times (related to the cost-effectiveness of advertising).</li></ol></li><li>Warm niches exercise: Write up a table with two columns. Then fill it up with names of people you know (column 1) and their work experiences / hobbies (column 2).</li><li>Common sentiment: All good niches are already taken. People starting out want an indication that the chosen niche is going to work, but it requires a leap of faith. Quoting previous chapter: <em>&ldquo;things will never be as clear as you want them to be&rdquo;</em>.</li><li>Approaches to brainstorming niches (warm niches are best to increase your chances of success, but some cold niches can be made warm through networking):<ol><li><em>&ldquo;Look at all areas of your life&rdquo;</em>.</li><li><em>&ldquo;Look at occupations&rdquo;</em>.</li><li><em>&ldquo;Cheat&rdquo;</em> by going through lists of existing ideas (the book includes many links such as <a href=https://astartupaday.wordpress.com/ target=_blank rel=noopener>A Startup A Day</a>, where the last post is from March 2011).</li></ol></li><li>Evaluating a niche:<ul><li>Focus on consumers and/or small businesses as they have purchasing authority, make decisions fast, and search for online solutions (not the case for large companies and government agencies, for example).</li><li>Check that the market is large enough. Rule of thumb: can place an ad in a specific magazine for less than $5,000 (though offline magazine advertising isn&rsquo;t recommended, and the inexistence of a magazine doesn&rsquo;t imply that the market isn&rsquo;t viable). Can also check labour statistics for the number of practitioners in a field. Be wary of markets that don&rsquo;t have dedicated websites or magazines, and where labour statistics claim less than 10,000 members.</li><li>Check that there&rsquo;s an inexpensive way to reach customers (typically online) – your niche is unlikely to be people who visit tech sites. The focus should be on where your customers are (e.g., a niche website). Top approaches for micropreneurs: building an audience and search engine optimisation. Secondary approaches include referral traffic, partnerships, article marketing, and cold calling. It&rsquo;s hard to generate a sustainable stream of prospects without the top approaches.</li></ul></li><li>It&rsquo;s better to focus on vertical markets (e.g., single industry or hobby) than on horizontal markets because members of a vertical have similar behaviours, talk to each other, hang out together, and have similar needs. This makes them easier to target, and increases the chances of organic product growth. Horizontal markets are rarely a good idea for micropreneurs because they&rsquo;re too large and expensive to navigate.</li><li>Measuring market demand without spending money:<ul><li>Obtain likely conversion rates for your price point.</li><li>Discover likely traffic volumes through keyword research tools (considering search engine traffic, incoming links, direct traffic, and advertising).</li><li>Example: 5,000 people search for <em>inventory software</em> each month. If you rank #1 for the term and get another 5,000 visitors from other sources, with a 0.5% conversion rate and a $200 product, you&rsquo;d make $10,000 per month.</li><li>Exercise: Take the top five hobbies and occupations from the warm niche brainstorming, ranked by personal interest. Ask the person you know about their problems to uncover software needs and ideas. Then use keyword research tools to see if there are other ideas that the person didn&rsquo;t mention.</li><li>Measure demand using free keyword research tools. There are also paid tools that can help, especially with assessing the difficulty of ranking for a term (many free tools are junk). As a rule of thumb, multiply by four the number of searches for generic terms like <em>attorney billing software</em> to get an estimate of traffic if you rank #1 for the generic term (due to traffic from long tail terms and other sources of traffic).</li><li>Check the competition – regardless of what the tools say, if the current top result is well-optimised and has a high PageRank, it&rsquo;s going to be tough to beat.</li></ul></li><li>Testing an idea for less than $100 (works for SaaS but not for products that rely on network effects):<ol><li>Choose the most interesting idea from your shortlist of product options (following niche & market demand research).</li><li>Set up a mini sales site with 2-3 pages (homepage, pricing / signup, and possibly a product tour).</li><li>Try to get people to click a &ldquo;buy now&rdquo; or &ldquo;free trial&rdquo; button (depending on the product cost).</li><li>Create an AdWords (now Google Ads) campaign to generate traffic.</li><li>Track clicks to estimate the conversion rate and link it to keywords. Notify users who are interested that the product is still under development.</li><li>If you feel bad about misleading potential customers, include a &ldquo;Coming Soon&rdquo; note somewhere.</li></ol></li></ul><p><strong>Key quotes:</strong></p><ul><li><em>&ldquo;The product with a sizable market and low competition wins even with bad marketing, a bad aesthetic, and poor functionality.&rdquo;</em></li><li><em>&ldquo;With luck on your side you don&rsquo;t need money, good marketing or a solid product. You just need to be lucky.&rdquo;</em></li><li><em>&ldquo;What matters is finding a group of people who need your</em> something <em>more than they need the money you&rsquo;re charging for it.&rdquo;</em></li><li><em>&ldquo;The best niches are reserved for people who do something.&rdquo;</em></li><li><em>&ldquo;As a self-funded startup you want a market that is already looking for your product, even if it doesn&rsquo;t exist. This is because creating demand is very, very expensive while filling existing demand is, by comparison, cheap.&rdquo;</em></li><li><em>&ldquo;If your target market is not online, you have no chance of succeeding using the methodologies you&rsquo;ll find in this book. This is non-negotiable.&rdquo;</em></li><li><em>&ldquo;When you receive 50,000 visitors from one of the major media sites you will be lucky to convert five sales.&rdquo;</em></li><li><em>&ldquo;Unfortunately, great products are often built and launched without a thought given to how the target audience will find out about it. You must have an inexpensive, ongoing source of new customers.&rdquo;</em></li><li><em>&ldquo;With niche research the problem is not finding new ideas, but narrowing to the most effective strategies that you can implement in a reasonable amount of time.&rdquo;</em></li><li><em>&ldquo;You only need to master two skills to sell online: human behavior and math.&rdquo;</em></li></ul><p><strong>My thoughts:</strong></p><ul><li>Despite agreeing with the overall message of focusing on niche markets, I find myself thinking of counter-examples (e.g., of people who got lucky playing the startup or product-first games). I suppose it&rsquo;s similar to the note towards the end of <a href=https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow target=_blank rel=noopener>Thinking, Fast and Slow</a> – being aware of biases isn&rsquo;t enough to eliminate them. <a href=https://en.wikipedia.org/wiki/Base_rate_fallacy target=_blank rel=noopener>Base rate neglect</a> is one such bias.</li><li>Much of the discussion around market size and the ability to reach customers reminded me of <a href=https://longform.asmartbear.com/problem/ target=_blank rel=noopener>Jason Cohen&rsquo;s post on the difference between successfully solving a problem and having a viable business model</a>. As the post came out in 2023, it&rsquo;s likely that people are still making the same mistakes, which is related to it being hard to fight our biases.</li><li>I don&rsquo;t fully agree with Walling&rsquo;s note on choosing the micropreneur path as a way of working on enjoyable projects. Starting a business isn&rsquo;t a good way to guarantee work on things you enjoy, unless you enjoy everything that comes with running a business. That is, I don&rsquo;t believe that running a VC-funded startup is <em>that</em> different from running a bootstrapped startup when it comes to enjoyment – there will always be unpleasant tasks. In line with my post on Chapter 1, I do believe that with a bootstrapped startup the founders have more control over aligning the work with their values – it&rsquo;s hard to say no to investors who are essentially your bosses. Theoretically, with a bootstrapped startup it&rsquo;s easier to say no to certain business activities and forgo the potential market share that comes with them, if pursuing such activities disagrees with your values – micropreneurs don&rsquo;t have to pursue <a href=http://www.paulgraham.com/growth.html target=_blank rel=noopener>growth at all costs</a>.</li><li>The book includes outdated references to magazine & newspaper ads, but it&rsquo;s easy to mentally translate the concepts to today&rsquo;s tech. I suppose the equivalents today are niche sites / newsletters / online magazines / influencer channels where one can advertise. The underlying principles age slower than specific technologies and tools.</li><li>A trap I hit the last time I read the book in 2014 is having a product idea and then insisting that a niche exists to match the product idea. It&rsquo;s probably easier to start without a product idea, as we tend to fall in love with our ideas.</li><li>Mentions of Web 2.0 as a shiny new thing bring back memories, and show how little has changed conceptually. The recent hype around blockchain and web3 looked a lot like solutions looking for problems to me.</li><li>It&rsquo;s hard to learn from the experience and advice of others. Between 2014 and now, I&rsquo;ve seen multiple examples of failures that are due to not following (or being aware of) key advice from the book. Still, I&rsquo;m somewhat swayed by misleading media noise and specific founder stories.</li><li>I&rsquo;m not in the marketing world, so I&rsquo;m curious to what extent things have changed around specific tooling & approaches. <a href=https://robwalling.com/#books target=_blank rel=noopener>Walling&rsquo;s newer books</a> might help (and there are a million and one others). But again, principles are key – just like in data science and software engineering where new tools often re-implement old ideas.</li><li>Similarly, I&rsquo;m curious to what extent website traffic is important these days, as websites aren&rsquo;t the only way to reach people online (e.g., there are app stores and social media channels). However, thinking about impressions and conversion rates is important regardless of the medium through which they&rsquo;re obtained.</li><li>The specifics of market demand research seem outdated. It&rsquo;s probably better to rely on newer focused resources for specifics, though people still use search engines. Free keyword research tools still exist in 2023 and are probably a good way to get started. Following the exact tips from the book <em>too</em> closely would be silly, though.</li><li>A mini sales site makes perfect sense, and is probably still cheap to build and test. It may even be cheaper today given the abundance of templates and tools to build static websites. Further, static sites can be hosted for free, with the only cost being the domain name (<a href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/>that&rsquo;s the case for my site</a>).</li></ul><p><strong>Key action items from this chapter:</strong></p><ul><li>Do the warm niche exercise</li><li>Find niches worth targeting (warmness is important, but alignment with interests and values is key)</li><li>Evaluate niche market size</li><li>Figure out how to reach promising niches</li><li>Interview people in the promising niches</li><li>Measure market demand for specific ideas</li><li>Narrow down the list of ideas</li><li>Build and market a mini sales site to gauge feasibility</li></ul><p>That&rsquo;s plenty of work, so I probably won&rsquo;t get to the next chapter for a while!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li><li><a href=https://yanirseroussi.com/tags/productivity/>Productivity</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 2) on x" href="https://x.com/intent/tweet/?text=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%202%29&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f17%2frevisiting-start-small-stay-small-in-2023-chapter-2%2f&amp;hashtags=books%2cbusiness%2ccareer%2cmarketing%2cpersonal%2cproductivity%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 2) on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f17%2frevisiting-start-small-stay-small-in-2023-chapter-2%2f&amp;title=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%202%29&amp;summary=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%202%29&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f17%2frevisiting-start-small-stay-small-in-2023-chapter-2%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 2) on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f17%2frevisiting-start-small-stay-small-in-2023-chapter-2%2f&title=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%202%29"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 2) on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f17%2frevisiting-start-small-stay-small-in-2023-chapter-2%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 2) on whatsapp" href="https://api.whatsapp.com/send?text=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%202%29%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f17%2frevisiting-start-small-stay-small-in-2023-chapter-2%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 2) on telegram" href="https://telegram.me/share/url?text=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%202%29&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f17%2frevisiting-start-small-stay-small-in-2023-chapter-2%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 2) on ycombinator" href="https://news.ycombinator.com/submitlink?t=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%202%29&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f17%2frevisiting-start-small-stay-small-in-2023-chapter-2%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="books,business,career,marketing,personal,productivity,quotes"><meta name=description content="A summary of the second chapter of Rob Walling&rsquo;s Start Small, Stay Small, along with my thoughts & reflections."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Revisiting Start Small, Stay Small in 2023 (Chapter 2)"><meta property="og:description" content="A summary of the second chapter of Rob Walling&rsquo;s Start Small, Stay Small, along with my thoughts & reflections."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-08-17T07:45:00+00:00"><meta property="article:modified_time" content="2024-03-12T16:33:31+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Revisiting Start Small, Stay Small in 2023 (Chapter 2)"><meta name=twitter:description content="A summary of the second chapter of Rob Walling&rsquo;s Start Small, Stay Small, along with my thoughts & reflections."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Revisiting Start Small, Stay Small in 2023 (Chapter 2)","item":"https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Revisiting Start Small, Stay Small in 2023 (Chapter 2)","name":"Revisiting Start Small, Stay Small in 2023 (Chapter 2)","description":"A summary of the second chapter of Rob Walling\u0026rsquo;s Start Small, Stay Small, along with my thoughts \u0026amp; reflections.","keywords":["books","business","career","marketing","personal","productivity","quotes"],"articleBody":"Following my previous TIL post on Chapter 1 of Start Small, Stay Small by Rob Walling, this post covers Chapter 2: Why Niches Are the Name of the Game.\nSummary:\nReiterating the need to find a market before building a product, which is the opposite of many developers’ approach of building a product first. If you get lucky, you can find success with building the product first, but the odds aren’t in your favour. It’s a similar story for many VC-backed startups. Reasons why people still follow the VC-backed startup: The lottery factor of a massive success / cashing out, and potential personal popularity. If the reason is cashing out, it’s worth thinking what you’d do later. If the goal is to work on projects you enjoy, you can start now without taking VC funding. Reasons you must go niche: “A niche requires you to narrow your product focus”. For example, you can build the perfect product for a single person you know well. Expanding this to a group of people that you keep happy leads to guaranteed revenue. “Niche advertising is more cost-effective”. “Niches have less competition”. Big companies don’t bother with products for small markets, e.g., Microsoft wouldn’t build a product for a market that generates an annual revenue of $500,000. “Niches have higher profit margins”. This is a result of having less competition. “Niche markets are not used to good marketing”. “It’s easier for prospects to trust you”. This is because they’re more likely to hear about you multiple times (related to the cost-effectiveness of advertising). Warm niches exercise: Write up a table with two columns. Then fill it up with names of people you know (column 1) and their work experiences / hobbies (column 2). Common sentiment: All good niches are already taken. People starting out want an indication that the chosen niche is going to work, but it requires a leap of faith. Quoting previous chapter: “things will never be as clear as you want them to be”. Approaches to brainstorming niches (warm niches are best to increase your chances of success, but some cold niches can be made warm through networking): “Look at all areas of your life”. “Look at occupations”. “Cheat” by going through lists of existing ideas (the book includes many links such as A Startup A Day, where the last post is from March 2011). Evaluating a niche: Focus on consumers and/or small businesses as they have purchasing authority, make decisions fast, and search for online solutions (not the case for large companies and government agencies, for example). Check that the market is large enough. Rule of thumb: can place an ad in a specific magazine for less than $5,000 (though offline magazine advertising isn’t recommended, and the inexistence of a magazine doesn’t imply that the market isn’t viable). Can also check labour statistics for the number of practitioners in a field. Be wary of markets that don’t have dedicated websites or magazines, and where labour statistics claim less than 10,000 members. Check that there’s an inexpensive way to reach customers (typically online) – your niche is unlikely to be people who visit tech sites. The focus should be on where your customers are (e.g., a niche website). Top approaches for micropreneurs: building an audience and search engine optimisation. Secondary approaches include referral traffic, partnerships, article marketing, and cold calling. It’s hard to generate a sustainable stream of prospects without the top approaches. It’s better to focus on vertical markets (e.g., single industry or hobby) than on horizontal markets because members of a vertical have similar behaviours, talk to each other, hang out together, and have similar needs. This makes them easier to target, and increases the chances of organic product growth. Horizontal markets are rarely a good idea for micropreneurs because they’re too large and expensive to navigate. Measuring market demand without spending money: Obtain likely conversion rates for your price point. Discover likely traffic volumes through keyword research tools (considering search engine traffic, incoming links, direct traffic, and advertising). Example: 5,000 people search for inventory software each month. If you rank #1 for the term and get another 5,000 visitors from other sources, with a 0.5% conversion rate and a $200 product, you’d make $10,000 per month. Exercise: Take the top five hobbies and occupations from the warm niche brainstorming, ranked by personal interest. Ask the person you know about their problems to uncover software needs and ideas. Then use keyword research tools to see if there are other ideas that the person didn’t mention. Measure demand using free keyword research tools. There are also paid tools that can help, especially with assessing the difficulty of ranking for a term (many free tools are junk). As a rule of thumb, multiply by four the number of searches for generic terms like attorney billing software to get an estimate of traffic if you rank #1 for the generic term (due to traffic from long tail terms and other sources of traffic). Check the competition – regardless of what the tools say, if the current top result is well-optimised and has a high PageRank, it’s going to be tough to beat. Testing an idea for less than $100 (works for SaaS but not for products that rely on network effects): Choose the most interesting idea from your shortlist of product options (following niche \u0026 market demand research). Set up a mini sales site with 2-3 pages (homepage, pricing / signup, and possibly a product tour). Try to get people to click a “buy now” or “free trial” button (depending on the product cost). Create an AdWords (now Google Ads) campaign to generate traffic. Track clicks to estimate the conversion rate and link it to keywords. Notify users who are interested that the product is still under development. If you feel bad about misleading potential customers, include a “Coming Soon” note somewhere. Key quotes:\n“The product with a sizable market and low competition wins even with bad marketing, a bad aesthetic, and poor functionality.” “With luck on your side you don’t need money, good marketing or a solid product. You just need to be lucky.” “What matters is finding a group of people who need your something more than they need the money you’re charging for it.” “The best niches are reserved for people who do something.” “As a self-funded startup you want a market that is already looking for your product, even if it doesn’t exist. This is because creating demand is very, very expensive while filling existing demand is, by comparison, cheap.” “If your target market is not online, you have no chance of succeeding using the methodologies you’ll find in this book. This is non-negotiable.” “When you receive 50,000 visitors from one of the major media sites you will be lucky to convert five sales.” “Unfortunately, great products are often built and launched without a thought given to how the target audience will find out about it. You must have an inexpensive, ongoing source of new customers.” “With niche research the problem is not finding new ideas, but narrowing to the most effective strategies that you can implement in a reasonable amount of time.” “You only need to master two skills to sell online: human behavior and math.” My thoughts:\nDespite agreeing with the overall message of focusing on niche markets, I find myself thinking of counter-examples (e.g., of people who got lucky playing the startup or product-first games). I suppose it’s similar to the note towards the end of Thinking, Fast and Slow – being aware of biases isn’t enough to eliminate them. Base rate neglect is one such bias. Much of the discussion around market size and the ability to reach customers reminded me of Jason Cohen’s post on the difference between successfully solving a problem and having a viable business model. As the post came out in 2023, it’s likely that people are still making the same mistakes, which is related to it being hard to fight our biases. I don’t fully agree with Walling’s note on choosing the micropreneur path as a way of working on enjoyable projects. Starting a business isn’t a good way to guarantee work on things you enjoy, unless you enjoy everything that comes with running a business. That is, I don’t believe that running a VC-funded startup is that different from running a bootstrapped startup when it comes to enjoyment – there will always be unpleasant tasks. In line with my post on Chapter 1, I do believe that with a bootstrapped startup the founders have more control over aligning the work with their values – it’s hard to say no to investors who are essentially your bosses. Theoretically, with a bootstrapped startup it’s easier to say no to certain business activities and forgo the potential market share that comes with them, if pursuing such activities disagrees with your values – micropreneurs don’t have to pursue growth at all costs. The book includes outdated references to magazine \u0026 newspaper ads, but it’s easy to mentally translate the concepts to today’s tech. I suppose the equivalents today are niche sites / newsletters / online magazines / influencer channels where one can advertise. The underlying principles age slower than specific technologies and tools. A trap I hit the last time I read the book in 2014 is having a product idea and then insisting that a niche exists to match the product idea. It’s probably easier to start without a product idea, as we tend to fall in love with our ideas. Mentions of Web 2.0 as a shiny new thing bring back memories, and show how little has changed conceptually. The recent hype around blockchain and web3 looked a lot like solutions looking for problems to me. It’s hard to learn from the experience and advice of others. Between 2014 and now, I’ve seen multiple examples of failures that are due to not following (or being aware of) key advice from the book. Still, I’m somewhat swayed by misleading media noise and specific founder stories. I’m not in the marketing world, so I’m curious to what extent things have changed around specific tooling \u0026 approaches. Walling’s newer books might help (and there are a million and one others). But again, principles are key – just like in data science and software engineering where new tools often re-implement old ideas. Similarly, I’m curious to what extent website traffic is important these days, as websites aren’t the only way to reach people online (e.g., there are app stores and social media channels). However, thinking about impressions and conversion rates is important regardless of the medium through which they’re obtained. The specifics of market demand research seem outdated. It’s probably better to rely on newer focused resources for specifics, though people still use search engines. Free keyword research tools still exist in 2023 and are probably a good way to get started. Following the exact tips from the book too closely would be silly, though. A mini sales site makes perfect sense, and is probably still cheap to build and test. It may even be cheaper today given the abundance of templates and tools to build static websites. Further, static sites can be hosted for free, with the only cost being the domain name (that’s the case for my site). Key action items from this chapter:\nDo the warm niche exercise Find niches worth targeting (warmness is important, but alignment with interests and values is key) Evaluate niche market size Figure out how to reach promising niches Interview people in the promising niches Measure market demand for specific ideas Narrow down the list of ideas Build and market a mini sales site to gauge feasibility That’s plenty of work, so I probably won’t get to the next chapter for a while!\n","wordCount":"1967","inLanguage":"en","datePublished":"2023-08-17T07:45:00Z","dateModified":"2024-03-12T16:33:31+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Revisiting Start Small, Stay Small in 2023 (Chapter 2)</h1><div class=post-meta><span title='2023-08-17 07:45:00 +0000 UTC'>August 17, 2023</span></div></header><div class=post-content><p>Following <a href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/>my previous TIL post on Chapter 1</a> of <a href=https://startsmall.com/ target=_blank rel=noopener><em>Start Small, Stay Small</em></a> by Rob Walling, this post covers Chapter 2: <em>Why Niches Are the Name of the Game</em>.</p><p><strong>Summary:</strong></p><ul><li>Reiterating the need to find a market before building a product, which is the opposite of many developers&rsquo; approach of building a product first.</li><li>If you get lucky, you can find success with building the product first, but the odds aren&rsquo;t in your favour. It&rsquo;s a similar story for many VC-backed startups.</li><li>Reasons why people still follow the VC-backed startup: The lottery factor of a massive success / cashing out, and potential personal popularity. If the reason is cashing out, it&rsquo;s worth thinking what you&rsquo;d do later. If the goal is to work on projects you enjoy, you can start now without taking VC funding.</li><li>Reasons you must go niche:<ol><li><em>&ldquo;A niche requires you to narrow your product focus&rdquo;</em>. For example, you can build the perfect product for a single person you know well. Expanding this to a group of people that you keep happy leads to guaranteed revenue.</li><li><em>&ldquo;Niche advertising is more cost-effective&rdquo;</em>.</li><li><em>&ldquo;Niches have less competition&rdquo;</em>. Big companies don&rsquo;t bother with products for small markets, e.g., Microsoft wouldn&rsquo;t build a product for a market that generates an annual revenue of $500,000.</li><li><em>&ldquo;Niches have higher profit margins&rdquo;</em>. This is a result of having less competition.</li><li><em>&ldquo;Niche markets are not used to good marketing&rdquo;</em>.</li><li><em>&ldquo;It&rsquo;s easier for prospects to trust you&rdquo;</em>. This is because they&rsquo;re more likely to hear about you multiple times (related to the cost-effectiveness of advertising).</li></ol></li><li>Warm niches exercise: Write up a table with two columns. Then fill it up with names of people you know (column 1) and their work experiences / hobbies (column 2).</li><li>Common sentiment: All good niches are already taken. People starting out want an indication that the chosen niche is going to work, but it requires a leap of faith. Quoting previous chapter: <em>&ldquo;things will never be as clear as you want them to be&rdquo;</em>.</li><li>Approaches to brainstorming niches (warm niches are best to increase your chances of success, but some cold niches can be made warm through networking):<ol><li><em>&ldquo;Look at all areas of your life&rdquo;</em>.</li><li><em>&ldquo;Look at occupations&rdquo;</em>.</li><li><em>&ldquo;Cheat&rdquo;</em> by going through lists of existing ideas (the book includes many links such as <a href=https://astartupaday.wordpress.com/ target=_blank rel=noopener>A Startup A Day</a>, where the last post is from March 2011).</li></ol></li><li>Evaluating a niche:<ul><li>Focus on consumers and/or small businesses as they have purchasing authority, make decisions fast, and search for online solutions (not the case for large companies and government agencies, for example).</li><li>Check that the market is large enough. Rule of thumb: can place an ad in a specific magazine for less than $5,000 (though offline magazine advertising isn&rsquo;t recommended, and the inexistence of a magazine doesn&rsquo;t imply that the market isn&rsquo;t viable). Can also check labour statistics for the number of practitioners in a field. Be wary of markets that don&rsquo;t have dedicated websites or magazines, and where labour statistics claim less than 10,000 members.</li><li>Check that there&rsquo;s an inexpensive way to reach customers (typically online) – your niche is unlikely to be people who visit tech sites. The focus should be on where your customers are (e.g., a niche website). Top approaches for micropreneurs: building an audience and search engine optimisation. Secondary approaches include referral traffic, partnerships, article marketing, and cold calling. It&rsquo;s hard to generate a sustainable stream of prospects without the top approaches.</li></ul></li><li>It&rsquo;s better to focus on vertical markets (e.g., single industry or hobby) than on horizontal markets because members of a vertical have similar behaviours, talk to each other, hang out together, and have similar needs. This makes them easier to target, and increases the chances of organic product growth. Horizontal markets are rarely a good idea for micropreneurs because they&rsquo;re too large and expensive to navigate.</li><li>Measuring market demand without spending money:<ul><li>Obtain likely conversion rates for your price point.</li><li>Discover likely traffic volumes through keyword research tools (considering search engine traffic, incoming links, direct traffic, and advertising).</li><li>Example: 5,000 people search for <em>inventory software</em> each month. If you rank #1 for the term and get another 5,000 visitors from other sources, with a 0.5% conversion rate and a $200 product, you&rsquo;d make $10,000 per month.</li><li>Exercise: Take the top five hobbies and occupations from the warm niche brainstorming, ranked by personal interest. Ask the person you know about their problems to uncover software needs and ideas. Then use keyword research tools to see if there are other ideas that the person didn&rsquo;t mention.</li><li>Measure demand using free keyword research tools. There are also paid tools that can help, especially with assessing the difficulty of ranking for a term (many free tools are junk). As a rule of thumb, multiply by four the number of searches for generic terms like <em>attorney billing software</em> to get an estimate of traffic if you rank #1 for the generic term (due to traffic from long tail terms and other sources of traffic).</li><li>Check the competition – regardless of what the tools say, if the current top result is well-optimised and has a high PageRank, it&rsquo;s going to be tough to beat.</li></ul></li><li>Testing an idea for less than $100 (works for SaaS but not for products that rely on network effects):<ol><li>Choose the most interesting idea from your shortlist of product options (following niche & market demand research).</li><li>Set up a mini sales site with 2-3 pages (homepage, pricing / signup, and possibly a product tour).</li><li>Try to get people to click a &ldquo;buy now&rdquo; or &ldquo;free trial&rdquo; button (depending on the product cost).</li><li>Create an AdWords (now Google Ads) campaign to generate traffic.</li><li>Track clicks to estimate the conversion rate and link it to keywords. Notify users who are interested that the product is still under development.</li><li>If you feel bad about misleading potential customers, include a &ldquo;Coming Soon&rdquo; note somewhere.</li></ol></li></ul><p><strong>Key quotes:</strong></p><ul><li><em>&ldquo;The product with a sizable market and low competition wins even with bad marketing, a bad aesthetic, and poor functionality.&rdquo;</em></li><li><em>&ldquo;With luck on your side you don&rsquo;t need money, good marketing or a solid product. You just need to be lucky.&rdquo;</em></li><li><em>&ldquo;What matters is finding a group of people who need your</em> something <em>more than they need the money you&rsquo;re charging for it.&rdquo;</em></li><li><em>&ldquo;The best niches are reserved for people who do something.&rdquo;</em></li><li><em>&ldquo;As a self-funded startup you want a market that is already looking for your product, even if it doesn&rsquo;t exist. This is because creating demand is very, very expensive while filling existing demand is, by comparison, cheap.&rdquo;</em></li><li><em>&ldquo;If your target market is not online, you have no chance of succeeding using the methodologies you&rsquo;ll find in this book. This is non-negotiable.&rdquo;</em></li><li><em>&ldquo;When you receive 50,000 visitors from one of the major media sites you will be lucky to convert five sales.&rdquo;</em></li><li><em>&ldquo;Unfortunately, great products are often built and launched without a thought given to how the target audience will find out about it. You must have an inexpensive, ongoing source of new customers.&rdquo;</em></li><li><em>&ldquo;With niche research the problem is not finding new ideas, but narrowing to the most effective strategies that you can implement in a reasonable amount of time.&rdquo;</em></li><li><em>&ldquo;You only need to master two skills to sell online: human behavior and math.&rdquo;</em></li></ul><p><strong>My thoughts:</strong></p><ul><li>Despite agreeing with the overall message of focusing on niche markets, I find myself thinking of counter-examples (e.g., of people who got lucky playing the startup or product-first games). I suppose it&rsquo;s similar to the note towards the end of <a href=https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow target=_blank rel=noopener>Thinking, Fast and Slow</a> – being aware of biases isn&rsquo;t enough to eliminate them. <a href=https://en.wikipedia.org/wiki/Base_rate_fallacy target=_blank rel=noopener>Base rate neglect</a> is one such bias.</li><li>Much of the discussion around market size and the ability to reach customers reminded me of <a href=https://longform.asmartbear.com/problem/ target=_blank rel=noopener>Jason Cohen&rsquo;s post on the difference between successfully solving a problem and having a viable business model</a>. As the post came out in 2023, it&rsquo;s likely that people are still making the same mistakes, which is related to it being hard to fight our biases.</li><li>I don&rsquo;t fully agree with Walling&rsquo;s note on choosing the micropreneur path as a way of working on enjoyable projects. Starting a business isn&rsquo;t a good way to guarantee work on things you enjoy, unless you enjoy everything that comes with running a business. That is, I don&rsquo;t believe that running a VC-funded startup is <em>that</em> different from running a bootstrapped startup when it comes to enjoyment – there will always be unpleasant tasks. In line with my post on Chapter 1, I do believe that with a bootstrapped startup the founders have more control over aligning the work with their values – it&rsquo;s hard to say no to investors who are essentially your bosses. Theoretically, with a bootstrapped startup it&rsquo;s easier to say no to certain business activities and forgo the potential market share that comes with them, if pursuing such activities disagrees with your values – micropreneurs don&rsquo;t have to pursue <a href=http://www.paulgraham.com/growth.html target=_blank rel=noopener>growth at all costs</a>.</li><li>The book includes outdated references to magazine & newspaper ads, but it&rsquo;s easy to mentally translate the concepts to today&rsquo;s tech. I suppose the equivalents today are niche sites / newsletters / online magazines / influencer channels where one can advertise. The underlying principles age slower than specific technologies and tools.</li><li>A trap I hit the last time I read the book in 2014 is having a product idea and then insisting that a niche exists to match the product idea. It&rsquo;s probably easier to start without a product idea, as we tend to fall in love with our ideas.</li><li>Mentions of Web 2.0 as a shiny new thing bring back memories, and show how little has changed conceptually. The recent hype around blockchain and web3 looked a lot like solutions looking for problems to me.</li><li>It&rsquo;s hard to learn from the experience and advice of others. Between 2014 and now, I&rsquo;ve seen multiple examples of failures that are due to not following (or being aware of) key advice from the book. Still, I&rsquo;m somewhat swayed by misleading media noise and specific founder stories.</li><li>I&rsquo;m not in the marketing world, so I&rsquo;m curious to what extent things have changed around specific tooling & approaches. <a href=https://robwalling.com/#books target=_blank rel=noopener>Walling&rsquo;s newer books</a> might help (and there are a million and one others). But again, principles are key – just like in data science and software engineering where new tools often re-implement old ideas.</li><li>Similarly, I&rsquo;m curious to what extent website traffic is important these days, as websites aren&rsquo;t the only way to reach people online (e.g., there are app stores and social media channels). However, thinking about impressions and conversion rates is important regardless of the medium through which they&rsquo;re obtained.</li><li>The specifics of market demand research seem outdated. It&rsquo;s probably better to rely on newer focused resources for specifics, though people still use search engines. Free keyword research tools still exist in 2023 and are probably a good way to get started. Following the exact tips from the book <em>too</em> closely would be silly, though.</li><li>A mini sales site makes perfect sense, and is probably still cheap to build and test. It may even be cheaper today given the abundance of templates and tools to build static websites. Further, static sites can be hosted for free, with the only cost being the domain name (<a href=https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/>that&rsquo;s the case for my site</a>).</li></ul><p><strong>Key action items from this chapter:</strong></p><ul><li>Do the warm niche exercise</li><li>Find niches worth targeting (warmness is important, but alignment with interests and values is key)</li><li>Evaluate niche market size</li><li>Figure out how to reach promising niches</li><li>Interview people in the promising niches</li><li>Measure market demand for specific ideas</li><li>Narrow down the list of ideas</li><li>Build and market a mini sales site to gauge feasibility</li></ul><p>That&rsquo;s plenty of work, so I probably won&rsquo;t get to the next chapter for a while!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li><li><a href=https://yanirseroussi.com/tags/productivity/>Productivity</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 2) on x" href="https://x.com/intent/tweet/?text=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%202%29&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f17%2frevisiting-start-small-stay-small-in-2023-chapter-2%2f&amp;hashtags=books%2cbusiness%2ccareer%2cmarketing%2cpersonal%2cproductivity%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 2) on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f17%2frevisiting-start-small-stay-small-in-2023-chapter-2%2f&amp;title=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%202%29&amp;summary=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%202%29&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f17%2frevisiting-start-small-stay-small-in-2023-chapter-2%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 2) on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f17%2frevisiting-start-small-stay-small-in-2023-chapter-2%2f&title=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%202%29"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 2) on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f17%2frevisiting-start-small-stay-small-in-2023-chapter-2%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 2) on whatsapp" href="https://api.whatsapp.com/send?text=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%202%29%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f17%2frevisiting-start-small-stay-small-in-2023-chapter-2%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 2) on telegram" href="https://telegram.me/share/url?text=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%202%29&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f17%2frevisiting-start-small-stay-small-in-2023-chapter-2%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Revisiting Start Small, Stay Small in 2023 (Chapter 2) on ycombinator" href="https://news.ycombinator.com/submitlink?t=Revisiting%20Start%20Small%2c%20Stay%20Small%20in%202023%20%28Chapter%202%29&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f17%2frevisiting-start-small-stay-small-in-2023-chapter-2%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/index.html b/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/index.html
index c4493c4d4..21065eeb3 100644
--- a/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/index.html
+++ b/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>The Minimalist Entrepreneur is too prescriptive for me | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="books,business,career,personal"><meta name=description content="While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder&rsquo;s experience."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The Minimalist Entrepreneur is too prescriptive for me"><meta property="og:description" content="While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder&rsquo;s experience."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-08-21T03:15:00+00:00"><meta property="article:modified_time" content="2024-03-12T16:33:31+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="The Minimalist Entrepreneur is too prescriptive for me"><meta name=twitter:description content="While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder&rsquo;s experience."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"The Minimalist Entrepreneur is too prescriptive for me","item":"https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The Minimalist Entrepreneur is too prescriptive for me","name":"The Minimalist Entrepreneur is too prescriptive for me","description":"While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder\u0026rsquo;s experience.","keywords":["books","business","career","personal"],"articleBody":"I picked up The Minimalist Entrepreneur by Sahil Lavingia from the local library after coming across Gumroad a few times and getting intrigued by its creation story. I was hoping for general but useful advice on starting a bootstrapped business, but was disappointed by the overly prescriptive nature of the book. It feels like the author over-generalised from his own experience, e.g., in emphasising organic community-based growth over other marketing channels, and in insisting that one must share the story of building the business on social media. Example quote: “people don’t care about your business and its success, they care about you and your struggles” – I’d say that most people don’t care about either you or your business, but I can see how Sahil’s story is interesting to Gumroad’s users (who are also small business owners / creators). There are countless examples of products I use where I know nothing about the people who make them. I use those products because they address a need or a want.\nThe book was off to a good start with Sahil’s story and advice that is in line with advice from Start Small, Stay Small, which I revisited recently. However, beyond around the midpoint of the book, I started losing patience with its prescriptiveness and began speed-reading and skipping irrelevant stories. As one Goodreads reviewer said, the book felt like “three blog posts in a trench coat”, which I suppose is unsurprising as it was born out of a blog post.\nGiven that it was a quick weekend read, the book wasn’t a complete waste of time. It probably would have been better if it remained focused on Sahil’s own story, which I found the most engaging. The best advice is perhaps in the intro: “You definitely do not need to finish this book to start. […] You start, then learn”.\n","wordCount":"309","inLanguage":"en","datePublished":"2023-08-21T03:15:00Z","dateModified":"2024-03-12T16:33:31+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">The Minimalist Entrepreneur is too prescriptive for me</h1><div class=post-meta><span title='2023-08-21 03:15:00 +0000 UTC'>August 21, 2023</span></div></header><div class=post-content><p>I picked up <a href=https://askmybook.com/ target=_blank rel=noopener><em>The Minimalist Entrepreneur</em></a> by Sahil Lavingia from the local library after coming across <a href=https://gumroad.com/ target=_blank rel=noopener>Gumroad</a> a few times and getting intrigued by <a href=https://sahillavingia.com/reflecting target=_blank rel=noopener>its creation story</a>. I was hoping for general but useful advice on starting a bootstrapped business, but was disappointed by the overly prescriptive nature of the book. It feels like the author over-generalised from his own experience, e.g., in emphasising organic community-based growth over other marketing channels, and in insisting that one must share the story of building the business on social media. Example quote: <em>&ldquo;people don&rsquo;t care about your business and its success, they care about you and your struggles&rdquo;</em> – I&rsquo;d say that most people don&rsquo;t care about either you or your business, but I can see how Sahil&rsquo;s story is interesting to Gumroad&rsquo;s users (who are also small business owners / creators). There are countless examples of products I use where I know nothing about the people who make them. I use those products because they address a need or a want.</p><p>The book was off to a good start with Sahil&rsquo;s story and advice that is in line with advice from <em>Start Small, Stay Small</em>, which I <a href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/>revisited recently</a>. However, beyond around the midpoint of the book, I started losing patience with its prescriptiveness and began speed-reading and skipping irrelevant stories. As <a href=https://www.goodreads.com/review/show/5204025999 target=_blank rel=noopener>one Goodreads reviewer said</a>, the book felt like <em>&ldquo;three blog posts in a trench coat&rdquo;</em>, which I suppose is unsurprising as it was born out of a blog post.</p><p>Given that it was a quick weekend read, the book wasn&rsquo;t a complete waste of time. It probably would have been better if it remained focused on Sahil&rsquo;s own story, which I found the most engaging. The best advice is perhaps in the intro: <em>&ldquo;You definitely do not need to finish this book to start. [&mldr;] You start, then learn&rdquo;</em>.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The Minimalist Entrepreneur is too prescriptive for me on x" href="https://x.com/intent/tweet/?text=The%20Minimalist%20Entrepreneur%20is%20too%20prescriptive%20for%20me&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f21%2fthe-minimalist-entrepreneur-is-too-prescriptive-for-me%2f&amp;hashtags=books%2cbusiness%2ccareer%2cpersonal"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The Minimalist Entrepreneur is too prescriptive for me on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f21%2fthe-minimalist-entrepreneur-is-too-prescriptive-for-me%2f&amp;title=The%20Minimalist%20Entrepreneur%20is%20too%20prescriptive%20for%20me&amp;summary=The%20Minimalist%20Entrepreneur%20is%20too%20prescriptive%20for%20me&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f21%2fthe-minimalist-entrepreneur-is-too-prescriptive-for-me%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The Minimalist Entrepreneur is too prescriptive for me on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f21%2fthe-minimalist-entrepreneur-is-too-prescriptive-for-me%2f&title=The%20Minimalist%20Entrepreneur%20is%20too%20prescriptive%20for%20me"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The Minimalist Entrepreneur is too prescriptive for me on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f21%2fthe-minimalist-entrepreneur-is-too-prescriptive-for-me%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The Minimalist Entrepreneur is too prescriptive for me on whatsapp" href="https://api.whatsapp.com/send?text=The%20Minimalist%20Entrepreneur%20is%20too%20prescriptive%20for%20me%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f21%2fthe-minimalist-entrepreneur-is-too-prescriptive-for-me%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The Minimalist Entrepreneur is too prescriptive for me on telegram" href="https://telegram.me/share/url?text=The%20Minimalist%20Entrepreneur%20is%20too%20prescriptive%20for%20me&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f21%2fthe-minimalist-entrepreneur-is-too-prescriptive-for-me%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The Minimalist Entrepreneur is too prescriptive for me on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20Minimalist%20Entrepreneur%20is%20too%20prescriptive%20for%20me&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f21%2fthe-minimalist-entrepreneur-is-too-prescriptive-for-me%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="books,business,career,personal"><meta name=description content="While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder&rsquo;s experience."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The Minimalist Entrepreneur is too prescriptive for me"><meta property="og:description" content="While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder&rsquo;s experience."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-08-21T03:15:00+00:00"><meta property="article:modified_time" content="2024-03-12T16:33:31+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="The Minimalist Entrepreneur is too prescriptive for me"><meta name=twitter:description content="While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder&rsquo;s experience."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"The Minimalist Entrepreneur is too prescriptive for me","item":"https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The Minimalist Entrepreneur is too prescriptive for me","name":"The Minimalist Entrepreneur is too prescriptive for me","description":"While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder\u0026rsquo;s experience.","keywords":["books","business","career","personal"],"articleBody":"I picked up The Minimalist Entrepreneur by Sahil Lavingia from the local library after coming across Gumroad a few times and getting intrigued by its creation story. I was hoping for general but useful advice on starting a bootstrapped business, but was disappointed by the overly prescriptive nature of the book. It feels like the author over-generalised from his own experience, e.g., in emphasising organic community-based growth over other marketing channels, and in insisting that one must share the story of building the business on social media. Example quote: “people don’t care about your business and its success, they care about you and your struggles” – I’d say that most people don’t care about either you or your business, but I can see how Sahil’s story is interesting to Gumroad’s users (who are also small business owners / creators). There are countless examples of products I use where I know nothing about the people who make them. I use those products because they address a need or a want.\nThe book was off to a good start with Sahil’s story and advice that is in line with advice from Start Small, Stay Small, which I revisited recently. However, beyond around the midpoint of the book, I started losing patience with its prescriptiveness and began speed-reading and skipping irrelevant stories. As one Goodreads reviewer said, the book felt like “three blog posts in a trench coat”, which I suppose is unsurprising as it was born out of a blog post.\nGiven that it was a quick weekend read, the book wasn’t a complete waste of time. It probably would have been better if it remained focused on Sahil’s own story, which I found the most engaging. The best advice is perhaps in the intro: “You definitely do not need to finish this book to start. […] You start, then learn”.\n","wordCount":"309","inLanguage":"en","datePublished":"2023-08-21T03:15:00Z","dateModified":"2024-03-12T16:33:31+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">The Minimalist Entrepreneur is too prescriptive for me</h1><div class=post-meta><span title='2023-08-21 03:15:00 +0000 UTC'>August 21, 2023</span></div></header><div class=post-content><p>I picked up <a href=https://askmybook.com/ target=_blank rel=noopener><em>The Minimalist Entrepreneur</em></a> by Sahil Lavingia from the local library after coming across <a href=https://gumroad.com/ target=_blank rel=noopener>Gumroad</a> a few times and getting intrigued by <a href=https://sahillavingia.com/reflecting target=_blank rel=noopener>its creation story</a>. I was hoping for general but useful advice on starting a bootstrapped business, but was disappointed by the overly prescriptive nature of the book. It feels like the author over-generalised from his own experience, e.g., in emphasising organic community-based growth over other marketing channels, and in insisting that one must share the story of building the business on social media. Example quote: <em>&ldquo;people don&rsquo;t care about your business and its success, they care about you and your struggles&rdquo;</em> – I&rsquo;d say that most people don&rsquo;t care about either you or your business, but I can see how Sahil&rsquo;s story is interesting to Gumroad&rsquo;s users (who are also small business owners / creators). There are countless examples of products I use where I know nothing about the people who make them. I use those products because they address a need or a want.</p><p>The book was off to a good start with Sahil&rsquo;s story and advice that is in line with advice from <em>Start Small, Stay Small</em>, which I <a href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/>revisited recently</a>. However, beyond around the midpoint of the book, I started losing patience with its prescriptiveness and began speed-reading and skipping irrelevant stories. As <a href=https://www.goodreads.com/review/show/5204025999 target=_blank rel=noopener>one Goodreads reviewer said</a>, the book felt like <em>&ldquo;three blog posts in a trench coat&rdquo;</em>, which I suppose is unsurprising as it was born out of a blog post.</p><p>Given that it was a quick weekend read, the book wasn&rsquo;t a complete waste of time. It probably would have been better if it remained focused on Sahil&rsquo;s own story, which I found the most engaging. The best advice is perhaps in the intro: <em>&ldquo;You definitely do not need to finish this book to start. [&mldr;] You start, then learn&rdquo;</em>.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The Minimalist Entrepreneur is too prescriptive for me on x" href="https://x.com/intent/tweet/?text=The%20Minimalist%20Entrepreneur%20is%20too%20prescriptive%20for%20me&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f21%2fthe-minimalist-entrepreneur-is-too-prescriptive-for-me%2f&amp;hashtags=books%2cbusiness%2ccareer%2cpersonal"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The Minimalist Entrepreneur is too prescriptive for me on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f21%2fthe-minimalist-entrepreneur-is-too-prescriptive-for-me%2f&amp;title=The%20Minimalist%20Entrepreneur%20is%20too%20prescriptive%20for%20me&amp;summary=The%20Minimalist%20Entrepreneur%20is%20too%20prescriptive%20for%20me&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f21%2fthe-minimalist-entrepreneur-is-too-prescriptive-for-me%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The Minimalist Entrepreneur is too prescriptive for me on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f21%2fthe-minimalist-entrepreneur-is-too-prescriptive-for-me%2f&title=The%20Minimalist%20Entrepreneur%20is%20too%20prescriptive%20for%20me"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The Minimalist Entrepreneur is too prescriptive for me on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f21%2fthe-minimalist-entrepreneur-is-too-prescriptive-for-me%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The Minimalist Entrepreneur is too prescriptive for me on whatsapp" href="https://api.whatsapp.com/send?text=The%20Minimalist%20Entrepreneur%20is%20too%20prescriptive%20for%20me%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f21%2fthe-minimalist-entrepreneur-is-too-prescriptive-for-me%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The Minimalist Entrepreneur is too prescriptive for me on telegram" href="https://telegram.me/share/url?text=The%20Minimalist%20Entrepreneur%20is%20too%20prescriptive%20for%20me&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f21%2fthe-minimalist-entrepreneur-is-too-prescriptive-for-me%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The Minimalist Entrepreneur is too prescriptive for me on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20Minimalist%20Entrepreneur%20is%20too%20prescriptive%20for%20me&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f08%2f21%2fthe-minimalist-entrepreneur-is-too-prescriptive-for-me%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/index.html b/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/index.html
index 0132aad1b..b25ab14f6 100644
--- a/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/index.html
+++ b/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Google's Rules of Machine Learning still apply in the age of large language models | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,data science,machine learning,software engineering"><meta name=description content="Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Google's Rules of Machine Learning still apply in the age of large language models"><meta property="og:description" content="Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-09-21T21:30:00+00:00"><meta property="article:modified_time" content="2023-09-22T07:54:13+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Google's Rules of Machine Learning still apply in the age of large language models"><meta name=twitter:description content="Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Google's Rules of Machine Learning still apply in the age of large language models","item":"https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Google's Rules of Machine Learning still apply in the age of large language models","name":"Google\u0027s Rules of Machine Learning still apply in the age of large language models","description":"Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices.","keywords":["artificial intelligence","data science","machine learning","software engineering"],"articleBody":"I heard about Google’s Rules of Machine Learning (ML) maybe 4-5 years ago. Much like Steve McConnell’s classic software engineering mistakes, the rules capture lessons learned from software engineering projects, though they are focused on the problems that arise from the engineering problem of shipping ML systems to production.\nDespite the excitement about playing with data and models, the reality of building ML systems is that it’s mostly an engineering problem. This remains the case in the age of large language models. Perhaps it’s even more so because integrating language models into a product can be as simple as calling an API, which should make it easier to focus on business problems, pipelines, data, and evaluation. It’s important to remember that at an abstract level, ML is just a data transformation – there is no magic involved.\nAs the page containing Google’s ML rules is long and detailed, I put together this TIL post for my own ease of reference. It contains the key quote from the overview, along with the rules without their explanations. Go to the source for further details.\nOverview To make great products:\ndo machine learning like the great engineer you are, not like the great machine learning expert you aren’t.\nMost of the problems you will face are, in fact, engineering problems. Even with all the resources of a great machine learning expert, most of the gains come from great features, not great machine learning algorithms. So, the basic approach is:\nMake sure your pipeline is solid end to end. Start with a reasonable objective. Add common-sense features in a simple way. Make sure that your pipeline stays solid. The Rules Before Machine Learning Don’t be afraid to launch a product without machine learning. First, design and implement metrics. Choose machine learning over a complex heuristic. ML Phase I: Your First Pipeline Keep the first model simple and get the infrastructure right. Test the infrastructure independently from the machine learning. Be careful about dropped data when copying pipelines. Turn heuristics into features, or handle them externally. Monitoring Know the freshness requirements of your system. Detect problems before exporting models. Watch for silent failures. Give feature columns owners and documentation. Your First Objective Don’t overthink which objective you choose to directly optimize. Choose a simple, observable and attributable metric for your first objective. Starting with an interpretable model makes debugging easier. Separate Spam Filtering and Quality Ranking in a Policy Layer. ML Phase II: Feature Engineering Plan to launch and iterate. Start with directly observed and reported features as opposed to learned features. Explore with features of content that generalize across contexts. Use very specific features when you can. Combine and modify existing features to create new features in human­-understandable ways. The number of feature weights you can learn in a linear model is roughly proportional to the amount of data you have. Clean up features you are no longer using. Human Analysis of the System You are not a typical end user. Measure the delta between models. When choosing models, utilitarian performance trumps predictive power. Look for patterns in the measured errors, and create new features. Try to quantify observed undesirable behavior. Be aware that identical short-term behavior does not imply identical long-term behavior. Training-Serving Skew The best way to make sure that you train like you serve is to save the set of features used at serving time, and then pipe those features to a log to use them at training time. Importance-weight sampled data, don’t arbitrarily drop it! Beware that if you join data from a table at training and serving time, the data in the table may change. Re-use code between your training pipeline and your serving pipeline whenever possible. If you produce a model based on the data until January 5th, test the model on the data from January 6th and after. In binary classification for filtering (such as spam detection or determining interesting emails), make small short-term sacrifices in performance for very clean data. Beware of the inherent skew in ranking problems. Avoid feedback loops with positional features. Measure Training/Serving Skew. ML Phase III: Slowed Growth, Optimization Refinement, and Complex Models Don’t waste time on new features if unaligned objectives have become the issue. Launch decisions are a proxy for long-term product goals. Keep ensembles simple. When performance plateaus, look for qualitatively new sources of information to add rather than refining existing signals. Don’t expect diversity, personalization, or relevance to be as correlated with popularity as you think they are. Your friends tend to be the same across different products. Your interests tend not to be. ","wordCount":"766","inLanguage":"en","datePublished":"2023-09-21T21:30:00Z","dateModified":"2023-09-22T07:54:13+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Google's Rules of Machine Learning still apply in the age of large language models</h1><div class=post-meta><span title='2023-09-21 21:30:00 +0000 UTC'>September 21, 2023</span></div></header><div class=post-content><p>I heard about <a href=https://developers.google.com/machine-learning/guides/rules-of-ml target=_blank rel=noopener>Google&rsquo;s Rules of Machine Learning (ML)</a> maybe 4-5 years ago. Much like <a href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/>Steve McConnell&rsquo;s classic software engineering mistakes</a>, the rules capture lessons learned from software engineering projects, though they are focused on the problems that arise from the engineering problem of shipping ML systems to production.</p><p>Despite the excitement about playing with data and models, the reality of building ML systems is that it&rsquo;s mostly an engineering problem. This remains the case in the age of large language models. Perhaps it&rsquo;s even more so because integrating language models into a product can be as simple as calling an API, which <em>should</em> make it easier to focus on business problems, pipelines, data, and evaluation. It&rsquo;s important to remember that at an abstract level, ML is just a data transformation – there is no magic involved.</p><p>As the page containing Google&rsquo;s ML rules is long and detailed, I put together this TIL post for my own ease of reference. It contains the key quote from the overview, along with the rules without their explanations. Go to <a href=https://developers.google.com/machine-learning/guides/rules-of-ml target=_blank rel=noopener>the source</a> for further details.</p><blockquote><h2 id=overview>Overview<a hidden class=anchor aria-hidden=true href=#overview>#</a></h2><p>To make great products:</p><p><strong>do machine learning like the great engineer you are, not like the great machine learning expert you aren&rsquo;t.</strong></p><p>Most of the problems you will face are, in fact, engineering problems. Even with all the resources of a great machine learning expert, most of the gains come from great features, not great machine learning algorithms. So, the basic approach is:</p><ol><li>Make sure your pipeline is solid end to end.</li><li>Start with a reasonable objective.</li><li>Add common-sense features in a simple way.</li><li>Make sure that your pipeline stays solid.</li></ol><h2 id=the-rules>The Rules<a hidden class=anchor aria-hidden=true href=#the-rules>#</a></h2><h3 id=before-machine-learning>Before Machine Learning<a hidden class=anchor aria-hidden=true href=#before-machine-learning>#</a></h3><ol><li>Don&rsquo;t be afraid to launch a product without machine learning.</li><li>First, design and implement metrics.</li><li>Choose machine learning over a complex heuristic.</li></ol><h3 id=ml-phase-i-your-first-pipeline>ML Phase I: Your First Pipeline<a hidden class=anchor aria-hidden=true href=#ml-phase-i-your-first-pipeline>#</a></h3><ol start=4><li>Keep the first model simple and get the infrastructure right.</li><li>Test the infrastructure independently from the machine learning.</li><li>Be careful about dropped data when copying pipelines.</li><li>Turn heuristics into features, or handle them externally.</li></ol><h3 id=monitoring>Monitoring<a hidden class=anchor aria-hidden=true href=#monitoring>#</a></h3><ol start=8><li>Know the freshness requirements of your system.</li><li>Detect problems before exporting models.</li><li>Watch for silent failures.</li><li>Give feature columns owners and documentation.</li></ol><h3 id=your-first-objective>Your First Objective<a hidden class=anchor aria-hidden=true href=#your-first-objective>#</a></h3><ol start=12><li>Don&rsquo;t overthink which objective you choose to directly optimize.</li><li>Choose a simple, observable and attributable metric for your first objective.</li><li>Starting with an interpretable model makes debugging easier.</li><li>Separate Spam Filtering and Quality Ranking in a Policy Layer.</li></ol><h3 id=ml-phase-ii-feature-engineering>ML Phase II: Feature Engineering<a hidden class=anchor aria-hidden=true href=#ml-phase-ii-feature-engineering>#</a></h3><ol start=16><li>Plan to launch and iterate.</li><li>Start with directly observed and reported features as opposed to learned features.</li><li>Explore with features of content that generalize across contexts.</li><li>Use very specific features when you can.</li><li>Combine and modify existing features to create new features in human­-understandable ways.</li><li>The number of feature weights you can learn in a linear model is roughly proportional to the amount of data you have.</li><li>Clean up features you are no longer using.</li></ol><h3 id=human-analysis-of-the-system>Human Analysis of the System<a hidden class=anchor aria-hidden=true href=#human-analysis-of-the-system>#</a></h3><ol start=23><li>You are not a typical end user.</li><li>Measure the delta between models.</li><li>When choosing models, utilitarian performance trumps predictive power.</li><li>Look for patterns in the measured errors, and create new features.</li><li>Try to quantify observed undesirable behavior.</li><li>Be aware that identical short-term behavior does not imply identical long-term behavior.</li></ol><h3 id=training-serving-skew>Training-Serving Skew<a hidden class=anchor aria-hidden=true href=#training-serving-skew>#</a></h3><ol start=29><li>The best way to make sure that you train like you serve is to save the set of features used at serving time, and then pipe those features to a log to use them at training time.</li><li>Importance-weight sampled data, don&rsquo;t arbitrarily drop it!</li><li>Beware that if you join data from a table at training and serving time, the data in the table may change.</li><li>Re-use code between your training pipeline and your serving pipeline whenever possible.</li><li>If you produce a model based on the data until January 5th, test the model on the data from January 6th and after.</li><li>In binary classification for filtering (such as spam detection or determining interesting emails), make small short-term sacrifices in performance for very clean data.</li><li>Beware of the inherent skew in ranking problems.</li><li>Avoid feedback loops with positional features.</li><li>Measure Training/Serving Skew.</li></ol><h3 id=ml-phase-iii-slowed-growth-optimization-refinement-and-complex-models>ML Phase III: Slowed Growth, Optimization Refinement, and Complex Models<a hidden class=anchor aria-hidden=true href=#ml-phase-iii-slowed-growth-optimization-refinement-and-complex-models>#</a></h3><ol start=38><li>Don&rsquo;t waste time on new features if unaligned objectives have become the issue.</li><li>Launch decisions are a proxy for long-term product goals.</li><li>Keep ensembles simple.</li><li>When performance plateaus, look for qualitatively new sources of information to add rather than refining existing signals.</li><li>Don&rsquo;t expect diversity, personalization, or relevance to be as correlated with popularity as you think they are.</li><li>Your friends tend to be the same across different products. Your interests tend not to be.</li></ol></blockquote></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Google's Rules of Machine Learning still apply in the age of large language models on x" href="https://x.com/intent/tweet/?text=Google%27s%20Rules%20of%20Machine%20Learning%20still%20apply%20in%20the%20age%20of%20large%20language%20models&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f21%2fgoogles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models%2f&amp;hashtags=artificialintelligence%2cdatascience%2cmachinelearning%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Google's Rules of Machine Learning still apply in the age of large language models on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f21%2fgoogles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models%2f&amp;title=Google%27s%20Rules%20of%20Machine%20Learning%20still%20apply%20in%20the%20age%20of%20large%20language%20models&amp;summary=Google%27s%20Rules%20of%20Machine%20Learning%20still%20apply%20in%20the%20age%20of%20large%20language%20models&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f21%2fgoogles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Google's Rules of Machine Learning still apply in the age of large language models on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f21%2fgoogles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models%2f&title=Google%27s%20Rules%20of%20Machine%20Learning%20still%20apply%20in%20the%20age%20of%20large%20language%20models"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Google's Rules of Machine Learning still apply in the age of large language models on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f21%2fgoogles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Google's Rules of Machine Learning still apply in the age of large language models on whatsapp" href="https://api.whatsapp.com/send?text=Google%27s%20Rules%20of%20Machine%20Learning%20still%20apply%20in%20the%20age%20of%20large%20language%20models%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f21%2fgoogles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Google's Rules of Machine Learning still apply in the age of large language models on telegram" href="https://telegram.me/share/url?text=Google%27s%20Rules%20of%20Machine%20Learning%20still%20apply%20in%20the%20age%20of%20large%20language%20models&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f21%2fgoogles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Google's Rules of Machine Learning still apply in the age of large language models on ycombinator" href="https://news.ycombinator.com/submitlink?t=Google%27s%20Rules%20of%20Machine%20Learning%20still%20apply%20in%20the%20age%20of%20large%20language%20models&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f21%2fgoogles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="artificial intelligence,data science,machine learning,software engineering"><meta name=description content="Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Google's Rules of Machine Learning still apply in the age of large language models"><meta property="og:description" content="Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-09-21T21:30:00+00:00"><meta property="article:modified_time" content="2023-09-22T07:54:13+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Google's Rules of Machine Learning still apply in the age of large language models"><meta name=twitter:description content="Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Google's Rules of Machine Learning still apply in the age of large language models","item":"https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Google's Rules of Machine Learning still apply in the age of large language models","name":"Google\u0027s Rules of Machine Learning still apply in the age of large language models","description":"Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices.","keywords":["artificial intelligence","data science","machine learning","software engineering"],"articleBody":"I heard about Google’s Rules of Machine Learning (ML) maybe 4-5 years ago. Much like Steve McConnell’s classic software engineering mistakes, the rules capture lessons learned from software engineering projects, though they are focused on the problems that arise from the engineering problem of shipping ML systems to production.\nDespite the excitement about playing with data and models, the reality of building ML systems is that it’s mostly an engineering problem. This remains the case in the age of large language models. Perhaps it’s even more so because integrating language models into a product can be as simple as calling an API, which should make it easier to focus on business problems, pipelines, data, and evaluation. It’s important to remember that at an abstract level, ML is just a data transformation – there is no magic involved.\nAs the page containing Google’s ML rules is long and detailed, I put together this TIL post for my own ease of reference. It contains the key quote from the overview, along with the rules without their explanations. Go to the source for further details.\nOverview To make great products:\ndo machine learning like the great engineer you are, not like the great machine learning expert you aren’t.\nMost of the problems you will face are, in fact, engineering problems. Even with all the resources of a great machine learning expert, most of the gains come from great features, not great machine learning algorithms. So, the basic approach is:\nMake sure your pipeline is solid end to end. Start with a reasonable objective. Add common-sense features in a simple way. Make sure that your pipeline stays solid. The Rules Before Machine Learning Don’t be afraid to launch a product without machine learning. First, design and implement metrics. Choose machine learning over a complex heuristic. ML Phase I: Your First Pipeline Keep the first model simple and get the infrastructure right. Test the infrastructure independently from the machine learning. Be careful about dropped data when copying pipelines. Turn heuristics into features, or handle them externally. Monitoring Know the freshness requirements of your system. Detect problems before exporting models. Watch for silent failures. Give feature columns owners and documentation. Your First Objective Don’t overthink which objective you choose to directly optimize. Choose a simple, observable and attributable metric for your first objective. Starting with an interpretable model makes debugging easier. Separate Spam Filtering and Quality Ranking in a Policy Layer. ML Phase II: Feature Engineering Plan to launch and iterate. Start with directly observed and reported features as opposed to learned features. Explore with features of content that generalize across contexts. Use very specific features when you can. Combine and modify existing features to create new features in human­-understandable ways. The number of feature weights you can learn in a linear model is roughly proportional to the amount of data you have. Clean up features you are no longer using. Human Analysis of the System You are not a typical end user. Measure the delta between models. When choosing models, utilitarian performance trumps predictive power. Look for patterns in the measured errors, and create new features. Try to quantify observed undesirable behavior. Be aware that identical short-term behavior does not imply identical long-term behavior. Training-Serving Skew The best way to make sure that you train like you serve is to save the set of features used at serving time, and then pipe those features to a log to use them at training time. Importance-weight sampled data, don’t arbitrarily drop it! Beware that if you join data from a table at training and serving time, the data in the table may change. Re-use code between your training pipeline and your serving pipeline whenever possible. If you produce a model based on the data until January 5th, test the model on the data from January 6th and after. In binary classification for filtering (such as spam detection or determining interesting emails), make small short-term sacrifices in performance for very clean data. Beware of the inherent skew in ranking problems. Avoid feedback loops with positional features. Measure Training/Serving Skew. ML Phase III: Slowed Growth, Optimization Refinement, and Complex Models Don’t waste time on new features if unaligned objectives have become the issue. Launch decisions are a proxy for long-term product goals. Keep ensembles simple. When performance plateaus, look for qualitatively new sources of information to add rather than refining existing signals. Don’t expect diversity, personalization, or relevance to be as correlated with popularity as you think they are. Your friends tend to be the same across different products. Your interests tend not to be. ","wordCount":"766","inLanguage":"en","datePublished":"2023-09-21T21:30:00Z","dateModified":"2023-09-22T07:54:13+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Google's Rules of Machine Learning still apply in the age of large language models</h1><div class=post-meta><span title='2023-09-21 21:30:00 +0000 UTC'>September 21, 2023</span></div></header><div class=post-content><p>I heard about <a href=https://developers.google.com/machine-learning/guides/rules-of-ml target=_blank rel=noopener>Google&rsquo;s Rules of Machine Learning (ML)</a> maybe 4-5 years ago. Much like <a href=https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/>Steve McConnell&rsquo;s classic software engineering mistakes</a>, the rules capture lessons learned from software engineering projects, though they are focused on the problems that arise from the engineering problem of shipping ML systems to production.</p><p>Despite the excitement about playing with data and models, the reality of building ML systems is that it&rsquo;s mostly an engineering problem. This remains the case in the age of large language models. Perhaps it&rsquo;s even more so because integrating language models into a product can be as simple as calling an API, which <em>should</em> make it easier to focus on business problems, pipelines, data, and evaluation. It&rsquo;s important to remember that at an abstract level, ML is just a data transformation – there is no magic involved.</p><p>As the page containing Google&rsquo;s ML rules is long and detailed, I put together this TIL post for my own ease of reference. It contains the key quote from the overview, along with the rules without their explanations. Go to <a href=https://developers.google.com/machine-learning/guides/rules-of-ml target=_blank rel=noopener>the source</a> for further details.</p><blockquote><h2 id=overview>Overview<a hidden class=anchor aria-hidden=true href=#overview>#</a></h2><p>To make great products:</p><p><strong>do machine learning like the great engineer you are, not like the great machine learning expert you aren&rsquo;t.</strong></p><p>Most of the problems you will face are, in fact, engineering problems. Even with all the resources of a great machine learning expert, most of the gains come from great features, not great machine learning algorithms. So, the basic approach is:</p><ol><li>Make sure your pipeline is solid end to end.</li><li>Start with a reasonable objective.</li><li>Add common-sense features in a simple way.</li><li>Make sure that your pipeline stays solid.</li></ol><h2 id=the-rules>The Rules<a hidden class=anchor aria-hidden=true href=#the-rules>#</a></h2><h3 id=before-machine-learning>Before Machine Learning<a hidden class=anchor aria-hidden=true href=#before-machine-learning>#</a></h3><ol><li>Don&rsquo;t be afraid to launch a product without machine learning.</li><li>First, design and implement metrics.</li><li>Choose machine learning over a complex heuristic.</li></ol><h3 id=ml-phase-i-your-first-pipeline>ML Phase I: Your First Pipeline<a hidden class=anchor aria-hidden=true href=#ml-phase-i-your-first-pipeline>#</a></h3><ol start=4><li>Keep the first model simple and get the infrastructure right.</li><li>Test the infrastructure independently from the machine learning.</li><li>Be careful about dropped data when copying pipelines.</li><li>Turn heuristics into features, or handle them externally.</li></ol><h3 id=monitoring>Monitoring<a hidden class=anchor aria-hidden=true href=#monitoring>#</a></h3><ol start=8><li>Know the freshness requirements of your system.</li><li>Detect problems before exporting models.</li><li>Watch for silent failures.</li><li>Give feature columns owners and documentation.</li></ol><h3 id=your-first-objective>Your First Objective<a hidden class=anchor aria-hidden=true href=#your-first-objective>#</a></h3><ol start=12><li>Don&rsquo;t overthink which objective you choose to directly optimize.</li><li>Choose a simple, observable and attributable metric for your first objective.</li><li>Starting with an interpretable model makes debugging easier.</li><li>Separate Spam Filtering and Quality Ranking in a Policy Layer.</li></ol><h3 id=ml-phase-ii-feature-engineering>ML Phase II: Feature Engineering<a hidden class=anchor aria-hidden=true href=#ml-phase-ii-feature-engineering>#</a></h3><ol start=16><li>Plan to launch and iterate.</li><li>Start with directly observed and reported features as opposed to learned features.</li><li>Explore with features of content that generalize across contexts.</li><li>Use very specific features when you can.</li><li>Combine and modify existing features to create new features in human­-understandable ways.</li><li>The number of feature weights you can learn in a linear model is roughly proportional to the amount of data you have.</li><li>Clean up features you are no longer using.</li></ol><h3 id=human-analysis-of-the-system>Human Analysis of the System<a hidden class=anchor aria-hidden=true href=#human-analysis-of-the-system>#</a></h3><ol start=23><li>You are not a typical end user.</li><li>Measure the delta between models.</li><li>When choosing models, utilitarian performance trumps predictive power.</li><li>Look for patterns in the measured errors, and create new features.</li><li>Try to quantify observed undesirable behavior.</li><li>Be aware that identical short-term behavior does not imply identical long-term behavior.</li></ol><h3 id=training-serving-skew>Training-Serving Skew<a hidden class=anchor aria-hidden=true href=#training-serving-skew>#</a></h3><ol start=29><li>The best way to make sure that you train like you serve is to save the set of features used at serving time, and then pipe those features to a log to use them at training time.</li><li>Importance-weight sampled data, don&rsquo;t arbitrarily drop it!</li><li>Beware that if you join data from a table at training and serving time, the data in the table may change.</li><li>Re-use code between your training pipeline and your serving pipeline whenever possible.</li><li>If you produce a model based on the data until January 5th, test the model on the data from January 6th and after.</li><li>In binary classification for filtering (such as spam detection or determining interesting emails), make small short-term sacrifices in performance for very clean data.</li><li>Beware of the inherent skew in ranking problems.</li><li>Avoid feedback loops with positional features.</li><li>Measure Training/Serving Skew.</li></ol><h3 id=ml-phase-iii-slowed-growth-optimization-refinement-and-complex-models>ML Phase III: Slowed Growth, Optimization Refinement, and Complex Models<a hidden class=anchor aria-hidden=true href=#ml-phase-iii-slowed-growth-optimization-refinement-and-complex-models>#</a></h3><ol start=38><li>Don&rsquo;t waste time on new features if unaligned objectives have become the issue.</li><li>Launch decisions are a proxy for long-term product goals.</li><li>Keep ensembles simple.</li><li>When performance plateaus, look for qualitatively new sources of information to add rather than refining existing signals.</li><li>Don&rsquo;t expect diversity, personalization, or relevance to be as correlated with popularity as you think they are.</li><li>Your friends tend to be the same across different products. Your interests tend not to be.</li></ol></blockquote></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Google's Rules of Machine Learning still apply in the age of large language models on x" href="https://x.com/intent/tweet/?text=Google%27s%20Rules%20of%20Machine%20Learning%20still%20apply%20in%20the%20age%20of%20large%20language%20models&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f21%2fgoogles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models%2f&amp;hashtags=artificialintelligence%2cdatascience%2cmachinelearning%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Google's Rules of Machine Learning still apply in the age of large language models on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f21%2fgoogles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models%2f&amp;title=Google%27s%20Rules%20of%20Machine%20Learning%20still%20apply%20in%20the%20age%20of%20large%20language%20models&amp;summary=Google%27s%20Rules%20of%20Machine%20Learning%20still%20apply%20in%20the%20age%20of%20large%20language%20models&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f21%2fgoogles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Google's Rules of Machine Learning still apply in the age of large language models on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f21%2fgoogles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models%2f&title=Google%27s%20Rules%20of%20Machine%20Learning%20still%20apply%20in%20the%20age%20of%20large%20language%20models"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Google's Rules of Machine Learning still apply in the age of large language models on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f21%2fgoogles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Google's Rules of Machine Learning still apply in the age of large language models on whatsapp" href="https://api.whatsapp.com/send?text=Google%27s%20Rules%20of%20Machine%20Learning%20still%20apply%20in%20the%20age%20of%20large%20language%20models%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f21%2fgoogles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Google's Rules of Machine Learning still apply in the age of large language models on telegram" href="https://telegram.me/share/url?text=Google%27s%20Rules%20of%20Machine%20Learning%20still%20apply%20in%20the%20age%20of%20large%20language%20models&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f21%2fgoogles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Google's Rules of Machine Learning still apply in the age of large language models on ycombinator" href="https://news.ycombinator.com/submitlink?t=Google%27s%20Rules%20of%20Machine%20Learning%20still%20apply%20in%20the%20age%20of%20large%20language%20models&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f21%2fgoogles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/index.html b/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/index.html
index eeb1be152..189e9a04c 100644
--- a/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/index.html
+++ b/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>The lines between solo consulting and product building are blurry | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,career,personal"><meta name=description content="It turns out that problems like finding a niche and defining the ideal clients are key to any solo business."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The lines between solo consulting and product building are blurry"><meta property="og:description" content="It turns out that problems like finding a niche and defining the ideal clients are key to any solo business."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-09-25T00:00:00+00:00"><meta property="article:modified_time" content="2023-09-25T11:15:26+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="The lines between solo consulting and product building are blurry"><meta name=twitter:description content="It turns out that problems like finding a niche and defining the ideal clients are key to any solo business."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"The lines between solo consulting and product building are blurry","item":"https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The lines between solo consulting and product building are blurry","name":"The lines between solo consulting and product building are blurry","description":"It turns out that problems like finding a niche and defining the ideal clients are key to any solo business.","keywords":["business","career","personal"],"articleBody":"I’ve been thinking about starting a small product business recently – something niche enough that I can build independently, while aiming for early profitability. As getting customers for any product requires marketing, I figured I could improve my marketing skills if I started by selling a product that already has a market: My services as a Data \u0026 AI Consultant. As the way I currently present myself is somewhat generic, my thinking was that this would be a good opportunity to improve my positioning by building dedicated landing pages for specific services and target audiences – something that’s necessary for any product.\nI chatted a bit with Bing on keyword research and this is one of the things it said:\nFirst of all, I think you should define your target audience and your value proposition clearly. Who are you trying to help with your consulting site? What problems can you solve for them? What benefits can they get from booking a call with you? These questions will help you craft a compelling message that attracts the right leads.\nOne of the resources it cited was a post on consulting best practices from Consulting Success, where some of the tips are around following proven processes, offering productised solutions, and developing a Magnetic Message, which they formulate as “I help [WHO] to [solve WHAT problem] so they can [see WHAT results]. My [WHY choose me]…” Another post by Consulting Success talks about building a consulting website, with the first step being to “understand your ideal client”.\nThis sort of advice might be obvious, but I suppose it’s needed because it’s easy to fall into the trap of making a consulting site about the consultant rather than about the ideal client and their problems. The parallel in the product world is starting from a product idea rather than from customer needs, which many people who enjoy building stuff tend to do.\nThe ideal bit is worth emphasising because going too broad can also be a mistake for independent consultants and product builders. Jonathan Stark addresses this point concisely, saying that “the only business strategy you’ll ever need [is to] help people you like get what they want.” This resonates with me since one of the reasons I want to work independently is to have choice over who I help – I find it hard to deal with the moral conundrums that come with working in a business that sells technology solutions to pretty much anyone (as long it’s legal).\nStarting with decisions on a niche and ideal clients is in line with the advice to product-focused micropreneurs from Start Small, Stay Small. It makes sense: If one of the aims is to keep the headcount low (starting from one and perhaps staying there), it’s impossible to effectively target a broad market. This is regardless of whether the product sold is software subscriptions or consulting services. What wasn’t obvious to me until I came across Consulting Success and similar resources is how similar solo consulting can be to solo product building. I thought of consulting more as freelancing or contracting – selling time/effort for money. But thinking of consulting as achieving results and solving problems for clients brings it much closer to the product realm. Further, through consulting, it’s possible to uncover problems that are shared by many clients, which can become the basis for a pure self-service software solution.\n","wordCount":"566","inLanguage":"en","datePublished":"2023-09-25T00:00:00Z","dateModified":"2023-09-25T11:15:26+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">The lines between solo consulting and product building are blurry</h1><div class=post-meta><span title='2023-09-25 00:00:00 +0000 UTC'>September 25, 2023</span></div></header><div class=post-content><p>I&rsquo;ve been thinking about <a href=https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/>starting a small product business recently</a> – something niche enough that I can build independently, while aiming for early profitability. As getting customers for any product requires marketing, I figured I could improve my marketing skills if I started by selling a product that already has a market: My services as a Data & AI Consultant. As the way I currently present myself is somewhat generic, my thinking was that this would be a good opportunity to improve my positioning by building dedicated landing pages for specific services and target audiences – something that&rsquo;s necessary for any product.</p><p>I chatted a bit with Bing on keyword research and this is one of the things it said:</p><blockquote><p>First of all, I think you should define your target audience and your value proposition clearly. Who are you trying to help with your consulting site? What problems can you solve for them? What benefits can they get from booking a call with you? These questions will help you craft a compelling message that attracts the right leads.</p></blockquote><p>One of the resources it cited was <a href=https://www.consultingsuccess.com/consulting-best-practices target=_blank rel=noopener>a post on consulting best practices from <em>Consulting Success</em></a>, where some of the tips are around following proven processes, offering productised solutions, and developing a <em>Magnetic Message</em>, which they formulate as <em>&ldquo;I help [WHO] to [solve WHAT problem] so they can [see WHAT results]. My [WHY choose me]&mldr;&rdquo;</em> Another post by <em>Consulting Success</em> talks about <a href=https://www.consultingsuccess.com/how-to-build-a-consulting-website target=_blank rel=noopener>building a consulting website</a>, with the first step being to <em>&ldquo;understand your ideal client&rdquo;</em>.</p><p>This sort of advice might be obvious, but I suppose it&rsquo;s needed because it&rsquo;s easy to fall into the trap of making a consulting site about the consultant rather than about the <em>ideal</em> client and their problems. The parallel in the product world is starting from a product idea rather than from customer needs, which many people who enjoy building stuff tend to do.</p><p>The <em>ideal</em> bit is worth emphasising because going too broad can also be a mistake for independent consultants and product builders. <a href=https://jonathanstark.com/daily/20200504-1409-the-only-business-strategy-youll-ever-need target=_blank rel=noopener>Jonathan Stark addresses this point concisely</a>, saying that <em>&ldquo;the only business strategy you&rsquo;ll ever need [is to] help people you like get what they want.&rdquo;</em> This resonates with me since one of the reasons I want to work independently is to have choice over who I help – I find it hard to deal with the moral conundrums that come with <a href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/>working in a business that sells technology solutions to pretty much anyone</a> (as long it&rsquo;s legal).</p><p>Starting with decisions on a niche and ideal clients is in line with <a href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/>the advice to product-focused micropreneurs from <em>Start Small, Stay Small</em></a>. It makes sense: If one of the aims is to keep the headcount low (starting from one and perhaps staying there), it&rsquo;s impossible to effectively target a broad market. This is regardless of whether the product sold is software subscriptions or consulting services. What wasn&rsquo;t obvious to me until I came across <em>Consulting Success</em> and similar resources is how similar solo consulting can be to solo product building. I thought of consulting more as <a href=https://www.consultingsuccess.com/consultant-vs-freelancer target=_blank rel=noopener>freelancing or contracting</a> – selling time/effort for money. But thinking of consulting as achieving results and solving problems for clients brings it much closer to the product realm. Further, through consulting, it&rsquo;s possible to uncover problems that are shared by many clients, which can become the basis for a pure self-service software solution.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The lines between solo consulting and product building are blurry on x" href="https://x.com/intent/tweet/?text=The%20lines%20between%20solo%20consulting%20and%20product%20building%20are%20blurry&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f25%2fthe-lines-between-solo-consulting-and-product-building-are-blurry%2f&amp;hashtags=business%2ccareer%2cpersonal"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The lines between solo consulting and product building are blurry on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f25%2fthe-lines-between-solo-consulting-and-product-building-are-blurry%2f&amp;title=The%20lines%20between%20solo%20consulting%20and%20product%20building%20are%20blurry&amp;summary=The%20lines%20between%20solo%20consulting%20and%20product%20building%20are%20blurry&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f25%2fthe-lines-between-solo-consulting-and-product-building-are-blurry%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The lines between solo consulting and product building are blurry on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f25%2fthe-lines-between-solo-consulting-and-product-building-are-blurry%2f&title=The%20lines%20between%20solo%20consulting%20and%20product%20building%20are%20blurry"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The lines between solo consulting and product building are blurry on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f25%2fthe-lines-between-solo-consulting-and-product-building-are-blurry%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The lines between solo consulting and product building are blurry on whatsapp" href="https://api.whatsapp.com/send?text=The%20lines%20between%20solo%20consulting%20and%20product%20building%20are%20blurry%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f25%2fthe-lines-between-solo-consulting-and-product-building-are-blurry%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The lines between solo consulting and product building are blurry on telegram" href="https://telegram.me/share/url?text=The%20lines%20between%20solo%20consulting%20and%20product%20building%20are%20blurry&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f25%2fthe-lines-between-solo-consulting-and-product-building-are-blurry%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The lines between solo consulting and product building are blurry on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20lines%20between%20solo%20consulting%20and%20product%20building%20are%20blurry&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f25%2fthe-lines-between-solo-consulting-and-product-building-are-blurry%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="business,career,personal"><meta name=description content="It turns out that problems like finding a niche and defining the ideal clients are key to any solo business."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The lines between solo consulting and product building are blurry"><meta property="og:description" content="It turns out that problems like finding a niche and defining the ideal clients are key to any solo business."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-09-25T00:00:00+00:00"><meta property="article:modified_time" content="2023-09-25T11:15:26+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="The lines between solo consulting and product building are blurry"><meta name=twitter:description content="It turns out that problems like finding a niche and defining the ideal clients are key to any solo business."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"The lines between solo consulting and product building are blurry","item":"https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The lines between solo consulting and product building are blurry","name":"The lines between solo consulting and product building are blurry","description":"It turns out that problems like finding a niche and defining the ideal clients are key to any solo business.","keywords":["business","career","personal"],"articleBody":"I’ve been thinking about starting a small product business recently – something niche enough that I can build independently, while aiming for early profitability. As getting customers for any product requires marketing, I figured I could improve my marketing skills if I started by selling a product that already has a market: My services as a Data \u0026 AI Consultant. As the way I currently present myself is somewhat generic, my thinking was that this would be a good opportunity to improve my positioning by building dedicated landing pages for specific services and target audiences – something that’s necessary for any product.\nI chatted a bit with Bing on keyword research and this is one of the things it said:\nFirst of all, I think you should define your target audience and your value proposition clearly. Who are you trying to help with your consulting site? What problems can you solve for them? What benefits can they get from booking a call with you? These questions will help you craft a compelling message that attracts the right leads.\nOne of the resources it cited was a post on consulting best practices from Consulting Success, where some of the tips are around following proven processes, offering productised solutions, and developing a Magnetic Message, which they formulate as “I help [WHO] to [solve WHAT problem] so they can [see WHAT results]. My [WHY choose me]…” Another post by Consulting Success talks about building a consulting website, with the first step being to “understand your ideal client”.\nThis sort of advice might be obvious, but I suppose it’s needed because it’s easy to fall into the trap of making a consulting site about the consultant rather than about the ideal client and their problems. The parallel in the product world is starting from a product idea rather than from customer needs, which many people who enjoy building stuff tend to do.\nThe ideal bit is worth emphasising because going too broad can also be a mistake for independent consultants and product builders. Jonathan Stark addresses this point concisely, saying that “the only business strategy you’ll ever need [is to] help people you like get what they want.” This resonates with me since one of the reasons I want to work independently is to have choice over who I help – I find it hard to deal with the moral conundrums that come with working in a business that sells technology solutions to pretty much anyone (as long it’s legal).\nStarting with decisions on a niche and ideal clients is in line with the advice to product-focused micropreneurs from Start Small, Stay Small. It makes sense: If one of the aims is to keep the headcount low (starting from one and perhaps staying there), it’s impossible to effectively target a broad market. This is regardless of whether the product sold is software subscriptions or consulting services. What wasn’t obvious to me until I came across Consulting Success and similar resources is how similar solo consulting can be to solo product building. I thought of consulting more as freelancing or contracting – selling time/effort for money. But thinking of consulting as achieving results and solving problems for clients brings it much closer to the product realm. Further, through consulting, it’s possible to uncover problems that are shared by many clients, which can become the basis for a pure self-service software solution.\n","wordCount":"566","inLanguage":"en","datePublished":"2023-09-25T00:00:00Z","dateModified":"2023-09-25T11:15:26+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">The lines between solo consulting and product building are blurry</h1><div class=post-meta><span title='2023-09-25 00:00:00 +0000 UTC'>September 25, 2023</span></div></header><div class=post-content><p>I&rsquo;ve been thinking about <a href=https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/>starting a small product business recently</a> – something niche enough that I can build independently, while aiming for early profitability. As getting customers for any product requires marketing, I figured I could improve my marketing skills if I started by selling a product that already has a market: My services as a Data & AI Consultant. As the way I currently present myself is somewhat generic, my thinking was that this would be a good opportunity to improve my positioning by building dedicated landing pages for specific services and target audiences – something that&rsquo;s necessary for any product.</p><p>I chatted a bit with Bing on keyword research and this is one of the things it said:</p><blockquote><p>First of all, I think you should define your target audience and your value proposition clearly. Who are you trying to help with your consulting site? What problems can you solve for them? What benefits can they get from booking a call with you? These questions will help you craft a compelling message that attracts the right leads.</p></blockquote><p>One of the resources it cited was <a href=https://www.consultingsuccess.com/consulting-best-practices target=_blank rel=noopener>a post on consulting best practices from <em>Consulting Success</em></a>, where some of the tips are around following proven processes, offering productised solutions, and developing a <em>Magnetic Message</em>, which they formulate as <em>&ldquo;I help [WHO] to [solve WHAT problem] so they can [see WHAT results]. My [WHY choose me]&mldr;&rdquo;</em> Another post by <em>Consulting Success</em> talks about <a href=https://www.consultingsuccess.com/how-to-build-a-consulting-website target=_blank rel=noopener>building a consulting website</a>, with the first step being to <em>&ldquo;understand your ideal client&rdquo;</em>.</p><p>This sort of advice might be obvious, but I suppose it&rsquo;s needed because it&rsquo;s easy to fall into the trap of making a consulting site about the consultant rather than about the <em>ideal</em> client and their problems. The parallel in the product world is starting from a product idea rather than from customer needs, which many people who enjoy building stuff tend to do.</p><p>The <em>ideal</em> bit is worth emphasising because going too broad can also be a mistake for independent consultants and product builders. <a href=https://jonathanstark.com/daily/20200504-1409-the-only-business-strategy-youll-ever-need target=_blank rel=noopener>Jonathan Stark addresses this point concisely</a>, saying that <em>&ldquo;the only business strategy you&rsquo;ll ever need [is to] help people you like get what they want.&rdquo;</em> This resonates with me since one of the reasons I want to work independently is to have choice over who I help – I find it hard to deal with the moral conundrums that come with <a href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/>working in a business that sells technology solutions to pretty much anyone</a> (as long it&rsquo;s legal).</p><p>Starting with decisions on a niche and ideal clients is in line with <a href=https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/>the advice to product-focused micropreneurs from <em>Start Small, Stay Small</em></a>. It makes sense: If one of the aims is to keep the headcount low (starting from one and perhaps staying there), it&rsquo;s impossible to effectively target a broad market. This is regardless of whether the product sold is software subscriptions or consulting services. What wasn&rsquo;t obvious to me until I came across <em>Consulting Success</em> and similar resources is how similar solo consulting can be to solo product building. I thought of consulting more as <a href=https://www.consultingsuccess.com/consultant-vs-freelancer target=_blank rel=noopener>freelancing or contracting</a> – selling time/effort for money. But thinking of consulting as achieving results and solving problems for clients brings it much closer to the product realm. Further, through consulting, it&rsquo;s possible to uncover problems that are shared by many clients, which can become the basis for a pure self-service software solution.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The lines between solo consulting and product building are blurry on x" href="https://x.com/intent/tweet/?text=The%20lines%20between%20solo%20consulting%20and%20product%20building%20are%20blurry&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f25%2fthe-lines-between-solo-consulting-and-product-building-are-blurry%2f&amp;hashtags=business%2ccareer%2cpersonal"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The lines between solo consulting and product building are blurry on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f25%2fthe-lines-between-solo-consulting-and-product-building-are-blurry%2f&amp;title=The%20lines%20between%20solo%20consulting%20and%20product%20building%20are%20blurry&amp;summary=The%20lines%20between%20solo%20consulting%20and%20product%20building%20are%20blurry&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f25%2fthe-lines-between-solo-consulting-and-product-building-are-blurry%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The lines between solo consulting and product building are blurry on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f25%2fthe-lines-between-solo-consulting-and-product-building-are-blurry%2f&title=The%20lines%20between%20solo%20consulting%20and%20product%20building%20are%20blurry"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The lines between solo consulting and product building are blurry on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f25%2fthe-lines-between-solo-consulting-and-product-building-are-blurry%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The lines between solo consulting and product building are blurry on whatsapp" href="https://api.whatsapp.com/send?text=The%20lines%20between%20solo%20consulting%20and%20product%20building%20are%20blurry%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f25%2fthe-lines-between-solo-consulting-and-product-building-are-blurry%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The lines between solo consulting and product building are blurry on telegram" href="https://telegram.me/share/url?text=The%20lines%20between%20solo%20consulting%20and%20product%20building%20are%20blurry&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f25%2fthe-lines-between-solo-consulting-and-product-building-are-blurry%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The lines between solo consulting and product building are blurry on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20lines%20between%20solo%20consulting%20and%20product%20building%20are%20blurry&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f09%2f25%2fthe-lines-between-solo-consulting-and-product-building-are-blurry%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/index.html b/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/index.html
index ff06ff745..b0a8011ac 100644
--- a/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/index.html
+++ b/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Artificial intelligence was a marketing term all along – just call it automation | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,ethics,marketing,quotes"><meta name=description content="Replacing &lsquo;artificial intelligence&rsquo; with &lsquo;automation&rsquo; is a useful trick for cutting through the hype."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Artificial intelligence was a marketing term all along – just call it automation"><meta property="og:description" content="Replacing &lsquo;artificial intelligence&rsquo; with &lsquo;automation&rsquo; is a useful trick for cutting through the hype."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-10-06T05:00:00+00:00"><meta property="article:modified_time" content="2023-10-06T15:11:27+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Artificial intelligence was a marketing term all along – just call it automation"><meta name=twitter:description content="Replacing &lsquo;artificial intelligence&rsquo; with &lsquo;automation&rsquo; is a useful trick for cutting through the hype."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Artificial intelligence was a marketing term all along – just call it automation","item":"https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Artificial intelligence was a marketing term all along – just call it automation","name":"Artificial intelligence was a marketing term all along – just call it automation","description":"Replacing \u0026lsquo;artificial intelligence\u0026rsquo; with \u0026lsquo;automation\u0026rsquo; is a useful trick for cutting through the hype.","keywords":["artificial intelligence","ethics","marketing","quotes"],"articleBody":"Quoting Emily M. Bender:\nWhat is AI?\nIn fact this is a marketing term. It’s a way to make certain kinds of automation sound sophisticated, powerful, or magical and as such it’s a way to dodge accountability by making the machines sound like autonomous thinking entities rather than tools that are created and used by people and companies. It’s also the name of a subfield of computer science concerned with making machines that “think like humans” but even there it was started as a marketing term in the 1950s to attract research funding to that field.\nI think that discussions of this technology become much clearer when we replace the term AI with the word “automation”. Then we can ask:\nWhat is being automated? Who’s automating it and why? Who benefits from that automation? How well does the automation work in its use case that we’re considering? Who’s being harmed? Who has accountability for the functioning of the automated system? What existing regulations already apply to the activities where the automation is being used? Marketing isn’t a bad thing – it all depends on what you market and the tactics you use. But getting to the bottom of things does require going beyond marketing lingo, which Bender does well in the above quote.\nI was curious about her claim that AI started as a marketing term. After some searching, I got to the Wikipedia page on the 1956 Dartmouth Workshop, which says that:\nIn 1955, John McCarthy, then a young Assistant Professor of Mathematics at Dartmouth College, decided to organize a group to clarify and develop ideas about thinking machines. He picked the name ‘Artificial Intelligence’ for the new field. He chose the name partly for its neutrality; avoiding a focus on narrow automata theory, and avoiding cybernetics which was heavily focused on analog feedback, as well as him potentially having to accept the assertive Norbert Wiener as guru or having to argue with him.\nI suppose that applying for research funding is a form of marketing, which is in line with Bender’s claim.\nIn any case, talking about automation of specific tasks rather than about “using AI” is a useful trick that I’ll be using in concrete discussions. Unfortunately, I’ll also keep using the AI marketing term where suitable, e.g., in my current title of Data \u0026 AI Consultant. After all, titles are a part of marketing.\n","wordCount":"398","inLanguage":"en","datePublished":"2023-10-06T05:00:00Z","dateModified":"2023-10-06T15:11:27+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Artificial intelligence was a marketing term all along – just call it automation</h1><div class=post-meta><span title='2023-10-06 05:00:00 +0000 UTC'>October 6, 2023</span></div></header><div class=post-content><p>Quoting <a href=https://medium.com/@emilymenonbender/opening-remarks-on-ai-in-the-workplace-new-crisis-or-longstanding-challenge-eb81d1bee9f target=_blank rel=noopener>Emily M. Bender</a>:</p><blockquote><p><strong>What is AI?</strong></p><p>In fact this is a marketing term. It&rsquo;s a way to make certain kinds of automation sound sophisticated, powerful, or magical and as such it&rsquo;s a way to dodge accountability by making the machines sound like autonomous thinking entities rather than tools that are created and used by people and companies. It&rsquo;s also the name of a subfield of computer science concerned with making machines that &ldquo;think like humans&rdquo; but even there it was started as a marketing term in the 1950s to attract research funding to that field.</p><p>I think that discussions of this technology become much clearer when we replace the term AI with the word &ldquo;automation&rdquo;. Then we can ask:</p><ul><li>What is being automated?</li><li>Who&rsquo;s automating it and why?</li><li>Who benefits from that automation?</li><li>How well does the automation work in its use case that we&rsquo;re considering?</li><li>Who&rsquo;s being harmed?</li><li>Who has accountability for the functioning of the automated system?</li><li>What existing regulations already apply to the activities where the automation is being used?</li></ul></blockquote><p>Marketing isn&rsquo;t a bad thing – it all depends on what you market and the tactics you use. But getting to the bottom of things does require going beyond marketing lingo, which Bender does well in the above quote.</p><p>I was curious about her claim that AI started as a marketing term. After some searching, I got to the Wikipedia page on <a href=https://en.wikipedia.org/wiki/Dartmouth_workshop target=_blank rel=noopener>the 1956 Dartmouth Workshop</a>, which says that:</p><blockquote><p>In 1955, John McCarthy, then a young Assistant Professor of Mathematics at Dartmouth College, decided to organize a group to clarify and develop ideas about thinking machines. He picked the name &lsquo;Artificial Intelligence&rsquo; for the new field. He chose the name partly for its neutrality; avoiding a focus on narrow automata theory, and avoiding cybernetics which was heavily focused on analog feedback, as well as him potentially having to accept the assertive Norbert Wiener as guru or having to argue with him.</p></blockquote><p>I suppose that applying for research funding is a form of marketing, which is in line with Bender&rsquo;s claim.</p><p>In any case, talking about automation of specific tasks rather than about &ldquo;using AI&rdquo; is a useful trick that I&rsquo;ll be using in concrete discussions. Unfortunately, I&rsquo;ll also keep using the AI marketing term where suitable, e.g., in my current title of <em>Data & AI Consultant</em>. After all, titles are a part of marketing.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/ethics/>Ethics</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Artificial intelligence was a marketing term all along – just call it automation on x" href="https://x.com/intent/tweet/?text=Artificial%20intelligence%20was%20a%20marketing%20term%20all%20along%20%e2%80%93%20just%20call%20it%20automation&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f10%2f06%2fartificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation%2f&amp;hashtags=artificialintelligence%2cethics%2cmarketing%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Artificial intelligence was a marketing term all along – just call it automation on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f10%2f06%2fartificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation%2f&amp;title=Artificial%20intelligence%20was%20a%20marketing%20term%20all%20along%20%e2%80%93%20just%20call%20it%20automation&amp;summary=Artificial%20intelligence%20was%20a%20marketing%20term%20all%20along%20%e2%80%93%20just%20call%20it%20automation&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f10%2f06%2fartificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Artificial intelligence was a marketing term all along – just call it automation on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f10%2f06%2fartificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation%2f&title=Artificial%20intelligence%20was%20a%20marketing%20term%20all%20along%20%e2%80%93%20just%20call%20it%20automation"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Artificial intelligence was a marketing term all along – just call it automation on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f10%2f06%2fartificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Artificial intelligence was a marketing term all along – just call it automation on whatsapp" href="https://api.whatsapp.com/send?text=Artificial%20intelligence%20was%20a%20marketing%20term%20all%20along%20%e2%80%93%20just%20call%20it%20automation%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f10%2f06%2fartificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Artificial intelligence was a marketing term all along – just call it automation on telegram" href="https://telegram.me/share/url?text=Artificial%20intelligence%20was%20a%20marketing%20term%20all%20along%20%e2%80%93%20just%20call%20it%20automation&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f10%2f06%2fartificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Artificial intelligence was a marketing term all along – just call it automation on ycombinator" href="https://news.ycombinator.com/submitlink?t=Artificial%20intelligence%20was%20a%20marketing%20term%20all%20along%20%e2%80%93%20just%20call%20it%20automation&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f10%2f06%2fartificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="artificial intelligence,ethics,marketing,quotes"><meta name=description content="Replacing &lsquo;artificial intelligence&rsquo; with &lsquo;automation&rsquo; is a useful trick for cutting through the hype."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Artificial intelligence was a marketing term all along – just call it automation"><meta property="og:description" content="Replacing &lsquo;artificial intelligence&rsquo; with &lsquo;automation&rsquo; is a useful trick for cutting through the hype."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-10-06T05:00:00+00:00"><meta property="article:modified_time" content="2023-10-06T15:11:27+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Artificial intelligence was a marketing term all along – just call it automation"><meta name=twitter:description content="Replacing &lsquo;artificial intelligence&rsquo; with &lsquo;automation&rsquo; is a useful trick for cutting through the hype."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Artificial intelligence was a marketing term all along – just call it automation","item":"https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Artificial intelligence was a marketing term all along – just call it automation","name":"Artificial intelligence was a marketing term all along – just call it automation","description":"Replacing \u0026lsquo;artificial intelligence\u0026rsquo; with \u0026lsquo;automation\u0026rsquo; is a useful trick for cutting through the hype.","keywords":["artificial intelligence","ethics","marketing","quotes"],"articleBody":"Quoting Emily M. Bender:\nWhat is AI?\nIn fact this is a marketing term. It’s a way to make certain kinds of automation sound sophisticated, powerful, or magical and as such it’s a way to dodge accountability by making the machines sound like autonomous thinking entities rather than tools that are created and used by people and companies. It’s also the name of a subfield of computer science concerned with making machines that “think like humans” but even there it was started as a marketing term in the 1950s to attract research funding to that field.\nI think that discussions of this technology become much clearer when we replace the term AI with the word “automation”. Then we can ask:\nWhat is being automated? Who’s automating it and why? Who benefits from that automation? How well does the automation work in its use case that we’re considering? Who’s being harmed? Who has accountability for the functioning of the automated system? What existing regulations already apply to the activities where the automation is being used? Marketing isn’t a bad thing – it all depends on what you market and the tactics you use. But getting to the bottom of things does require going beyond marketing lingo, which Bender does well in the above quote.\nI was curious about her claim that AI started as a marketing term. After some searching, I got to the Wikipedia page on the 1956 Dartmouth Workshop, which says that:\nIn 1955, John McCarthy, then a young Assistant Professor of Mathematics at Dartmouth College, decided to organize a group to clarify and develop ideas about thinking machines. He picked the name ‘Artificial Intelligence’ for the new field. He chose the name partly for its neutrality; avoiding a focus on narrow automata theory, and avoiding cybernetics which was heavily focused on analog feedback, as well as him potentially having to accept the assertive Norbert Wiener as guru or having to argue with him.\nI suppose that applying for research funding is a form of marketing, which is in line with Bender’s claim.\nIn any case, talking about automation of specific tasks rather than about “using AI” is a useful trick that I’ll be using in concrete discussions. Unfortunately, I’ll also keep using the AI marketing term where suitable, e.g., in my current title of Data \u0026 AI Consultant. After all, titles are a part of marketing.\n","wordCount":"398","inLanguage":"en","datePublished":"2023-10-06T05:00:00Z","dateModified":"2023-10-06T15:11:27+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Artificial intelligence was a marketing term all along – just call it automation</h1><div class=post-meta><span title='2023-10-06 05:00:00 +0000 UTC'>October 6, 2023</span></div></header><div class=post-content><p>Quoting <a href=https://medium.com/@emilymenonbender/opening-remarks-on-ai-in-the-workplace-new-crisis-or-longstanding-challenge-eb81d1bee9f target=_blank rel=noopener>Emily M. Bender</a>:</p><blockquote><p><strong>What is AI?</strong></p><p>In fact this is a marketing term. It&rsquo;s a way to make certain kinds of automation sound sophisticated, powerful, or magical and as such it&rsquo;s a way to dodge accountability by making the machines sound like autonomous thinking entities rather than tools that are created and used by people and companies. It&rsquo;s also the name of a subfield of computer science concerned with making machines that &ldquo;think like humans&rdquo; but even there it was started as a marketing term in the 1950s to attract research funding to that field.</p><p>I think that discussions of this technology become much clearer when we replace the term AI with the word &ldquo;automation&rdquo;. Then we can ask:</p><ul><li>What is being automated?</li><li>Who&rsquo;s automating it and why?</li><li>Who benefits from that automation?</li><li>How well does the automation work in its use case that we&rsquo;re considering?</li><li>Who&rsquo;s being harmed?</li><li>Who has accountability for the functioning of the automated system?</li><li>What existing regulations already apply to the activities where the automation is being used?</li></ul></blockquote><p>Marketing isn&rsquo;t a bad thing – it all depends on what you market and the tactics you use. But getting to the bottom of things does require going beyond marketing lingo, which Bender does well in the above quote.</p><p>I was curious about her claim that AI started as a marketing term. After some searching, I got to the Wikipedia page on <a href=https://en.wikipedia.org/wiki/Dartmouth_workshop target=_blank rel=noopener>the 1956 Dartmouth Workshop</a>, which says that:</p><blockquote><p>In 1955, John McCarthy, then a young Assistant Professor of Mathematics at Dartmouth College, decided to organize a group to clarify and develop ideas about thinking machines. He picked the name &lsquo;Artificial Intelligence&rsquo; for the new field. He chose the name partly for its neutrality; avoiding a focus on narrow automata theory, and avoiding cybernetics which was heavily focused on analog feedback, as well as him potentially having to accept the assertive Norbert Wiener as guru or having to argue with him.</p></blockquote><p>I suppose that applying for research funding is a form of marketing, which is in line with Bender&rsquo;s claim.</p><p>In any case, talking about automation of specific tasks rather than about &ldquo;using AI&rdquo; is a useful trick that I&rsquo;ll be using in concrete discussions. Unfortunately, I&rsquo;ll also keep using the AI marketing term where suitable, e.g., in my current title of <em>Data & AI Consultant</em>. After all, titles are a part of marketing.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/ethics/>Ethics</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Artificial intelligence was a marketing term all along – just call it automation on x" href="https://x.com/intent/tweet/?text=Artificial%20intelligence%20was%20a%20marketing%20term%20all%20along%20%e2%80%93%20just%20call%20it%20automation&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f10%2f06%2fartificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation%2f&amp;hashtags=artificialintelligence%2cethics%2cmarketing%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Artificial intelligence was a marketing term all along – just call it automation on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f10%2f06%2fartificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation%2f&amp;title=Artificial%20intelligence%20was%20a%20marketing%20term%20all%20along%20%e2%80%93%20just%20call%20it%20automation&amp;summary=Artificial%20intelligence%20was%20a%20marketing%20term%20all%20along%20%e2%80%93%20just%20call%20it%20automation&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f10%2f06%2fartificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Artificial intelligence was a marketing term all along – just call it automation on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f10%2f06%2fartificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation%2f&title=Artificial%20intelligence%20was%20a%20marketing%20term%20all%20along%20%e2%80%93%20just%20call%20it%20automation"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Artificial intelligence was a marketing term all along – just call it automation on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f10%2f06%2fartificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Artificial intelligence was a marketing term all along – just call it automation on whatsapp" href="https://api.whatsapp.com/send?text=Artificial%20intelligence%20was%20a%20marketing%20term%20all%20along%20%e2%80%93%20just%20call%20it%20automation%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f10%2f06%2fartificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Artificial intelligence was a marketing term all along – just call it automation on telegram" href="https://telegram.me/share/url?text=Artificial%20intelligence%20was%20a%20marketing%20term%20all%20along%20%e2%80%93%20just%20call%20it%20automation&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f10%2f06%2fartificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Artificial intelligence was a marketing term all along – just call it automation on ycombinator" href="https://news.ycombinator.com/submitlink?t=Artificial%20intelligence%20was%20a%20marketing%20term%20all%20along%20%e2%80%93%20just%20call%20it%20automation&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f10%2f06%2fartificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/index.html b/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/index.html
index b4406abf9..157b6d162 100644
--- a/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/index.html
+++ b/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>You don't need a proprietary API for static maps | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="data engineering,data science,Reef Life Survey,software engineering,web development"><meta name=description content="For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="You don't need a proprietary API for static maps"><meta property="og:description" content="For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-11-21T06:00:00+00:00"><meta property="article:modified_time" content="2023-11-21T16:12:27+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="You don't need a proprietary API for static maps"><meta name=twitter:description content="For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"You don't need a proprietary API for static maps","item":"https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"You don't need a proprietary API for static maps","name":"You don\u0027t need a proprietary API for static maps","description":"For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps.","keywords":["data engineering","data science","Reef Life Survey","software engineering","web development"],"articleBody":"In addition to my long-time volunteering as a scuba diver with Reef Life Survey (RLS), I’ve been helping them with bits and pieces around data engineering / science / web work (somewhat reluctantly). A few months ago, we discovered an issue in the PDF field guides exported from Reef Species of the World: species distribution maps embedded in the PDFs were broken. This was due to a change in Google’s Static Maps API, which became a paid feature.\nWhile paying for the Google API would have been the simplest solution (and fairly cheap with appropriate caching), this was an opportunity to improve the static map functionality while removing the paid proprietary API dependency. For me, it was also an opportunity to learn a bit about geospatial analysis in Python, which I’ve been curious about.\nA bit more context on the problem: Each of the species in the PDF field guides is shown with its distribution map, as recorded in the RLS dataset. Some species are widespread and common, like Labroides dimidatus (a cleaner wrasse that was recorded in over 4,000 sites). Unlike the maps presented on the web version, PDF maps are meant to fit a box that’s about 4.5cm by 3.5cm when printed, so space is limited for on-map labels.\nAn important limitation of the Google Static Maps API (which is shared by the cheaper Mapbox API) is that of request URL length. This isn’t an issue for maps with a few custom features, but requesting a map with thousands of markers isn’t feasible without reducing coordinate accuracy and clustering markers to reduce their number. This complicates the code that calls the static mapping API, and can easily lead to unexpected results, like fish found on dry land.\nI suppose that an attractive feature of the Google Static Maps API is the simplicity of embedding maps in pure front-end applications, as it obviates the need to implement a back-end to generate the static maps. However, this feature was irrelevant to the PDF generation task, which happens on the back-end anyway.\nOnce I understood the downsides of sticking with proprietary static map APIs (including their limited customisability), I realised I could expand the Python data processing code in the rls-data repo to pre-generate all the maps whenever new survey data becomes available. The final result was about 5,000 distribution maps that are committed to the repo. This admittedly stretches the common use cases for Git repos, but at about 15KB per map, it’s not terrible. In any case, it’d be easy to store the maps on S3 if needed.\nThe full code with the change, including the GitHub Action that refreshes the maps, is in this PR. It’s a bit hard to navigate since GitHub doesn’t like PRs with thousands of files, but the commit history gives the full picture of my experimentation with Python-based mapping solutions. The map generation code that ended up getting merged starts here.\nPython has a large ecosystem of geospatial packages, so choosing the right packages for the use case was a bit tricky. However, I heard about geopandas, so I used it for my first round of experiments. I got reasonable-looking maps, but it was a bit slow. I also found the auto-zoom functionality frustrating – given the space constraints, balancing the zoom level with keeping a constant aspect ratio and the need for legibility seemed non-trivial (at least to me).\nSome discussions on zooming with ChatGPT led to it mentioning cartopy. I was quickly sold on it given all the pretty maps in the cartopy gallery. It also turned out to be much faster – generating the maps with cached tiles (using geopandas and contextily) was six times slower than using cartopy with Natural Earth features. The cartopy solution was also twice as fast as using geopandas with Natural Earth, and I could easily set the colour of the ocean to match the colour used on the RLS website – definitely a winner! A full run to regenerate all the maps with a standard GitHub Actions runner takes about 3.5 minutes, which is reasonable for something that runs at most daily.\nI’m far from a geospatial expert, so the solution I landed on for zooming with a constant aspect ratio isn’t great: There are a few hard-coded map areas with recognisable coastlines (Australia, Europe, North America, etc.), which obviates the need for labelling. For each species distribution, the code chooses the minimal area that fits all the sites. I find that it reduces the mental overload in comparison to auto-zoom when looking at a bunch of maps in the context of the PDF, but we may swap this for another solution. For now, it’s good enough, and a definite improvement over broken maps.\n","wordCount":"790","inLanguage":"en","datePublished":"2023-11-21T06:00:00Z","dateModified":"2023-11-21T16:12:27+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">You don't need a proprietary API for static maps</h1><div class=post-meta><span title='2023-11-21 06:00:00 +0000 UTC'>November 21, 2023</span></div></header><div class=post-content><p>In addition to my long-time volunteering as a scuba diver with <a href=https://reeflifesurvey.com/ target=_blank rel=noopener>Reef Life Survey</a> (RLS), I&rsquo;ve been helping them with bits and pieces around data engineering / science / web work (<a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>somewhat reluctantly</a>). A few months ago, we discovered an issue in the PDF field guides exported from <a href=https://reeflifesurvey.com/species/ target=_blank rel=noopener>Reef Species of the World</a>: species distribution maps embedded in the PDFs were broken. This was due to a change in <a href=https://developers.google.com/maps/documentation/maps-static/overview target=_blank rel=noopener>Google&rsquo;s Static Maps API</a>, which became a paid feature.</p><p>While paying for the Google API would have been the simplest solution (and fairly cheap with appropriate caching), this was an opportunity to improve the static map functionality while removing the paid proprietary API dependency. For me, it was also an opportunity to learn a bit about geospatial analysis in Python, which I&rsquo;ve been curious about.</p><p>A bit more context on the problem: Each of the species in the PDF field guides is shown with its distribution map, as recorded in the RLS dataset. Some species are widespread and common, like <a href=https://reeflifesurvey.com/species/labroides-dimidiatus/ target=_blank rel=noopener><em>Labroides dimidatus</em></a> (a cleaner wrasse that was recorded in over 4,000 sites). Unlike the maps presented on the web version, PDF maps are meant to fit a box that&rsquo;s about 4.5cm by 3.5cm when printed, so space is limited for on-map labels.</p><p>An important limitation of the Google Static Maps API (which is shared by the cheaper Mapbox API) is that of request URL length. This isn&rsquo;t an issue for maps with a few custom features, but requesting a map with thousands of markers isn&rsquo;t feasible without reducing coordinate accuracy and clustering markers to reduce their number. This complicates the code that calls the static mapping API, and can easily lead to unexpected results, like fish found on dry land.</p><p>I suppose that an attractive feature of the Google Static Maps API is the simplicity of embedding maps in pure front-end applications, as it obviates the need to implement a back-end to generate the static maps. However, this feature was irrelevant to the PDF generation task, which happens on the back-end anyway.</p><p>Once I understood the downsides of sticking with proprietary static map APIs (including their limited customisability), I realised I could expand the Python data processing code in <a href=https://github.com/yanirs/rls-data target=_blank rel=noopener>the <code>rls-data</code> repo</a> to pre-generate all the maps whenever new survey data becomes available. The final result was <a href=https://github.com/yanirs/rls-data/tree/master/maps target=_blank rel=noopener>about 5,000 distribution maps that are committed to the repo</a>. This admittedly stretches the common use cases for Git repos, but at about 15KB per map, it&rsquo;s not terrible. In any case, it&rsquo;d be easy to store the maps on S3 if needed.</p><p>The full code with the change, including the GitHub Action that refreshes the maps, is in <a href=https://github.com/yanirs/rls-data/pull/36 target=_blank rel=noopener>this PR</a>. It&rsquo;s a bit hard to navigate since GitHub doesn&rsquo;t like PRs with thousands of files, but the commit history gives the full picture of my experimentation with Python-based mapping solutions. The map generation code that ended up getting merged starts <a href=https://github.com/yanirs/rls-data/blob/ac0eec5988efeaa95347371002226574cc6c7ff9/rls/processor.py#L295 target=_blank rel=noopener>here</a>.</p><p>Python has <a href=https://ecosystem.pythongis.org/ target=_blank rel=noopener>a large ecosystem of geospatial packages</a>, so choosing the right packages for the use case was a bit tricky. However, I heard about <a href=https://geopandas.org/en/stable/ target=_blank rel=noopener><code>geopandas</code></a>, so I used it for my first round of experiments. I got reasonable-looking maps, but it was a bit slow. I also found the auto-zoom functionality frustrating – given the space constraints, balancing the zoom level with keeping a constant aspect ratio and the need for legibility seemed non-trivial (at least to me).</p><p>Some discussions on zooming with ChatGPT led to it mentioning <a href=https://scitools.org.uk/cartopy/docs/latest/index.html target=_blank rel=noopener><code>cartopy</code></a>. I was quickly sold on it <a href=https://scitools.org.uk/cartopy/docs/latest/gallery/index.html target=_blank rel=noopener>given all the pretty maps in the <code>cartopy</code> gallery</a>. It also turned out to be much faster – generating the maps with cached tiles (using <code>geopandas</code> and <a href=https://contextily.readthedocs.io/en/latest/ target=_blank rel=noopener><code>contextily</code></a>) was six times slower than <a href=https://scitools.org.uk/cartopy/docs/latest/gallery/lines_and_polygons/features.html target=_blank rel=noopener>using <code>cartopy</code> with Natural Earth features</a>. The <code>cartopy</code> solution was also twice as fast as using geopandas with Natural Earth, and I could easily set the colour of the ocean to match the colour used on the RLS website – definitely a winner! A full run to regenerate all the maps with a standard GitHub Actions runner takes about 3.5 minutes, which is reasonable for something that runs at most daily.</p><p>I&rsquo;m far from a geospatial expert, so the solution I landed on for zooming with a constant aspect ratio isn&rsquo;t great: There are a few hard-coded map areas with recognisable coastlines (Australia, Europe, North America, etc.), which obviates the need for labelling. For each species distribution, the code chooses the minimal area that fits all the sites. I find that it reduces the mental overload in comparison to auto-zoom when looking at a bunch of maps in the context of the PDF, but we may swap this for another solution. For now, it&rsquo;s good enough, and a definite improvement over broken maps.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-engineering/>Data Engineering</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/reef-life-survey/>Reef Life Survey</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li><li><a href=https://yanirseroussi.com/tags/web-development/>Web Development</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share You don't need a proprietary API for static maps on x" href="https://x.com/intent/tweet/?text=You%20don%27t%20need%20a%20proprietary%20API%20for%20static%20maps&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f21%2fyou-dont-need-a-proprietary-api-for-static-maps%2f&amp;hashtags=dataengineering%2cdatascience%2cReefLifeSurvey%2csoftwareengineering%2cwebdevelopment"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don't need a proprietary API for static maps on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f21%2fyou-dont-need-a-proprietary-api-for-static-maps%2f&amp;title=You%20don%27t%20need%20a%20proprietary%20API%20for%20static%20maps&amp;summary=You%20don%27t%20need%20a%20proprietary%20API%20for%20static%20maps&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f21%2fyou-dont-need-a-proprietary-api-for-static-maps%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don't need a proprietary API for static maps on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f21%2fyou-dont-need-a-proprietary-api-for-static-maps%2f&title=You%20don%27t%20need%20a%20proprietary%20API%20for%20static%20maps"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don't need a proprietary API for static maps on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f21%2fyou-dont-need-a-proprietary-api-for-static-maps%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don't need a proprietary API for static maps on whatsapp" href="https://api.whatsapp.com/send?text=You%20don%27t%20need%20a%20proprietary%20API%20for%20static%20maps%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f21%2fyou-dont-need-a-proprietary-api-for-static-maps%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don't need a proprietary API for static maps on telegram" href="https://telegram.me/share/url?text=You%20don%27t%20need%20a%20proprietary%20API%20for%20static%20maps&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f21%2fyou-dont-need-a-proprietary-api-for-static-maps%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don't need a proprietary API for static maps on ycombinator" href="https://news.ycombinator.com/submitlink?t=You%20don%27t%20need%20a%20proprietary%20API%20for%20static%20maps&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f21%2fyou-dont-need-a-proprietary-api-for-static-maps%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="data engineering,data science,Reef Life Survey,software engineering,web development"><meta name=description content="For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="You don't need a proprietary API for static maps"><meta property="og:description" content="For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-11-21T06:00:00+00:00"><meta property="article:modified_time" content="2023-11-21T16:12:27+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="You don't need a proprietary API for static maps"><meta name=twitter:description content="For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"You don't need a proprietary API for static maps","item":"https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"You don't need a proprietary API for static maps","name":"You don\u0027t need a proprietary API for static maps","description":"For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps.","keywords":["data engineering","data science","Reef Life Survey","software engineering","web development"],"articleBody":"In addition to my long-time volunteering as a scuba diver with Reef Life Survey (RLS), I’ve been helping them with bits and pieces around data engineering / science / web work (somewhat reluctantly). A few months ago, we discovered an issue in the PDF field guides exported from Reef Species of the World: species distribution maps embedded in the PDFs were broken. This was due to a change in Google’s Static Maps API, which became a paid feature.\nWhile paying for the Google API would have been the simplest solution (and fairly cheap with appropriate caching), this was an opportunity to improve the static map functionality while removing the paid proprietary API dependency. For me, it was also an opportunity to learn a bit about geospatial analysis in Python, which I’ve been curious about.\nA bit more context on the problem: Each of the species in the PDF field guides is shown with its distribution map, as recorded in the RLS dataset. Some species are widespread and common, like Labroides dimidatus (a cleaner wrasse that was recorded in over 4,000 sites). Unlike the maps presented on the web version, PDF maps are meant to fit a box that’s about 4.5cm by 3.5cm when printed, so space is limited for on-map labels.\nAn important limitation of the Google Static Maps API (which is shared by the cheaper Mapbox API) is that of request URL length. This isn’t an issue for maps with a few custom features, but requesting a map with thousands of markers isn’t feasible without reducing coordinate accuracy and clustering markers to reduce their number. This complicates the code that calls the static mapping API, and can easily lead to unexpected results, like fish found on dry land.\nI suppose that an attractive feature of the Google Static Maps API is the simplicity of embedding maps in pure front-end applications, as it obviates the need to implement a back-end to generate the static maps. However, this feature was irrelevant to the PDF generation task, which happens on the back-end anyway.\nOnce I understood the downsides of sticking with proprietary static map APIs (including their limited customisability), I realised I could expand the Python data processing code in the rls-data repo to pre-generate all the maps whenever new survey data becomes available. The final result was about 5,000 distribution maps that are committed to the repo. This admittedly stretches the common use cases for Git repos, but at about 15KB per map, it’s not terrible. In any case, it’d be easy to store the maps on S3 if needed.\nThe full code with the change, including the GitHub Action that refreshes the maps, is in this PR. It’s a bit hard to navigate since GitHub doesn’t like PRs with thousands of files, but the commit history gives the full picture of my experimentation with Python-based mapping solutions. The map generation code that ended up getting merged starts here.\nPython has a large ecosystem of geospatial packages, so choosing the right packages for the use case was a bit tricky. However, I heard about geopandas, so I used it for my first round of experiments. I got reasonable-looking maps, but it was a bit slow. I also found the auto-zoom functionality frustrating – given the space constraints, balancing the zoom level with keeping a constant aspect ratio and the need for legibility seemed non-trivial (at least to me).\nSome discussions on zooming with ChatGPT led to it mentioning cartopy. I was quickly sold on it given all the pretty maps in the cartopy gallery. It also turned out to be much faster – generating the maps with cached tiles (using geopandas and contextily) was six times slower than using cartopy with Natural Earth features. The cartopy solution was also twice as fast as using geopandas with Natural Earth, and I could easily set the colour of the ocean to match the colour used on the RLS website – definitely a winner! A full run to regenerate all the maps with a standard GitHub Actions runner takes about 3.5 minutes, which is reasonable for something that runs at most daily.\nI’m far from a geospatial expert, so the solution I landed on for zooming with a constant aspect ratio isn’t great: There are a few hard-coded map areas with recognisable coastlines (Australia, Europe, North America, etc.), which obviates the need for labelling. For each species distribution, the code chooses the minimal area that fits all the sites. I find that it reduces the mental overload in comparison to auto-zoom when looking at a bunch of maps in the context of the PDF, but we may swap this for another solution. For now, it’s good enough, and a definite improvement over broken maps.\n","wordCount":"790","inLanguage":"en","datePublished":"2023-11-21T06:00:00Z","dateModified":"2023-11-21T16:12:27+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">You don't need a proprietary API for static maps</h1><div class=post-meta><span title='2023-11-21 06:00:00 +0000 UTC'>November 21, 2023</span></div></header><div class=post-content><p>In addition to my long-time volunteering as a scuba diver with <a href=https://reeflifesurvey.com/ target=_blank rel=noopener>Reef Life Survey</a> (RLS), I&rsquo;ve been helping them with bits and pieces around data engineering / science / web work (<a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>somewhat reluctantly</a>). A few months ago, we discovered an issue in the PDF field guides exported from <a href=https://reeflifesurvey.com/species/ target=_blank rel=noopener>Reef Species of the World</a>: species distribution maps embedded in the PDFs were broken. This was due to a change in <a href=https://developers.google.com/maps/documentation/maps-static/overview target=_blank rel=noopener>Google&rsquo;s Static Maps API</a>, which became a paid feature.</p><p>While paying for the Google API would have been the simplest solution (and fairly cheap with appropriate caching), this was an opportunity to improve the static map functionality while removing the paid proprietary API dependency. For me, it was also an opportunity to learn a bit about geospatial analysis in Python, which I&rsquo;ve been curious about.</p><p>A bit more context on the problem: Each of the species in the PDF field guides is shown with its distribution map, as recorded in the RLS dataset. Some species are widespread and common, like <a href=https://reeflifesurvey.com/species/labroides-dimidiatus/ target=_blank rel=noopener><em>Labroides dimidatus</em></a> (a cleaner wrasse that was recorded in over 4,000 sites). Unlike the maps presented on the web version, PDF maps are meant to fit a box that&rsquo;s about 4.5cm by 3.5cm when printed, so space is limited for on-map labels.</p><p>An important limitation of the Google Static Maps API (which is shared by the cheaper Mapbox API) is that of request URL length. This isn&rsquo;t an issue for maps with a few custom features, but requesting a map with thousands of markers isn&rsquo;t feasible without reducing coordinate accuracy and clustering markers to reduce their number. This complicates the code that calls the static mapping API, and can easily lead to unexpected results, like fish found on dry land.</p><p>I suppose that an attractive feature of the Google Static Maps API is the simplicity of embedding maps in pure front-end applications, as it obviates the need to implement a back-end to generate the static maps. However, this feature was irrelevant to the PDF generation task, which happens on the back-end anyway.</p><p>Once I understood the downsides of sticking with proprietary static map APIs (including their limited customisability), I realised I could expand the Python data processing code in <a href=https://github.com/yanirs/rls-data target=_blank rel=noopener>the <code>rls-data</code> repo</a> to pre-generate all the maps whenever new survey data becomes available. The final result was <a href=https://github.com/yanirs/rls-data/tree/master/maps target=_blank rel=noopener>about 5,000 distribution maps that are committed to the repo</a>. This admittedly stretches the common use cases for Git repos, but at about 15KB per map, it&rsquo;s not terrible. In any case, it&rsquo;d be easy to store the maps on S3 if needed.</p><p>The full code with the change, including the GitHub Action that refreshes the maps, is in <a href=https://github.com/yanirs/rls-data/pull/36 target=_blank rel=noopener>this PR</a>. It&rsquo;s a bit hard to navigate since GitHub doesn&rsquo;t like PRs with thousands of files, but the commit history gives the full picture of my experimentation with Python-based mapping solutions. The map generation code that ended up getting merged starts <a href=https://github.com/yanirs/rls-data/blob/ac0eec5988efeaa95347371002226574cc6c7ff9/rls/processor.py#L295 target=_blank rel=noopener>here</a>.</p><p>Python has <a href=https://ecosystem.pythongis.org/ target=_blank rel=noopener>a large ecosystem of geospatial packages</a>, so choosing the right packages for the use case was a bit tricky. However, I heard about <a href=https://geopandas.org/en/stable/ target=_blank rel=noopener><code>geopandas</code></a>, so I used it for my first round of experiments. I got reasonable-looking maps, but it was a bit slow. I also found the auto-zoom functionality frustrating – given the space constraints, balancing the zoom level with keeping a constant aspect ratio and the need for legibility seemed non-trivial (at least to me).</p><p>Some discussions on zooming with ChatGPT led to it mentioning <a href=https://scitools.org.uk/cartopy/docs/latest/index.html target=_blank rel=noopener><code>cartopy</code></a>. I was quickly sold on it <a href=https://scitools.org.uk/cartopy/docs/latest/gallery/index.html target=_blank rel=noopener>given all the pretty maps in the <code>cartopy</code> gallery</a>. It also turned out to be much faster – generating the maps with cached tiles (using <code>geopandas</code> and <a href=https://contextily.readthedocs.io/en/latest/ target=_blank rel=noopener><code>contextily</code></a>) was six times slower than <a href=https://scitools.org.uk/cartopy/docs/latest/gallery/lines_and_polygons/features.html target=_blank rel=noopener>using <code>cartopy</code> with Natural Earth features</a>. The <code>cartopy</code> solution was also twice as fast as using geopandas with Natural Earth, and I could easily set the colour of the ocean to match the colour used on the RLS website – definitely a winner! A full run to regenerate all the maps with a standard GitHub Actions runner takes about 3.5 minutes, which is reasonable for something that runs at most daily.</p><p>I&rsquo;m far from a geospatial expert, so the solution I landed on for zooming with a constant aspect ratio isn&rsquo;t great: There are a few hard-coded map areas with recognisable coastlines (Australia, Europe, North America, etc.), which obviates the need for labelling. For each species distribution, the code chooses the minimal area that fits all the sites. I find that it reduces the mental overload in comparison to auto-zoom when looking at a bunch of maps in the context of the PDF, but we may swap this for another solution. For now, it&rsquo;s good enough, and a definite improvement over broken maps.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/data-engineering/>Data Engineering</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li><li><a href=https://yanirseroussi.com/tags/reef-life-survey/>Reef Life Survey</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li><li><a href=https://yanirseroussi.com/tags/web-development/>Web Development</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share You don't need a proprietary API for static maps on x" href="https://x.com/intent/tweet/?text=You%20don%27t%20need%20a%20proprietary%20API%20for%20static%20maps&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f21%2fyou-dont-need-a-proprietary-api-for-static-maps%2f&amp;hashtags=dataengineering%2cdatascience%2cReefLifeSurvey%2csoftwareengineering%2cwebdevelopment"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don't need a proprietary API for static maps on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f21%2fyou-dont-need-a-proprietary-api-for-static-maps%2f&amp;title=You%20don%27t%20need%20a%20proprietary%20API%20for%20static%20maps&amp;summary=You%20don%27t%20need%20a%20proprietary%20API%20for%20static%20maps&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f21%2fyou-dont-need-a-proprietary-api-for-static-maps%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don't need a proprietary API for static maps on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f21%2fyou-dont-need-a-proprietary-api-for-static-maps%2f&title=You%20don%27t%20need%20a%20proprietary%20API%20for%20static%20maps"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don't need a proprietary API for static maps on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f21%2fyou-dont-need-a-proprietary-api-for-static-maps%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don't need a proprietary API for static maps on whatsapp" href="https://api.whatsapp.com/send?text=You%20don%27t%20need%20a%20proprietary%20API%20for%20static%20maps%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f21%2fyou-dont-need-a-proprietary-api-for-static-maps%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don't need a proprietary API for static maps on telegram" href="https://telegram.me/share/url?text=You%20don%27t%20need%20a%20proprietary%20API%20for%20static%20maps&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f21%2fyou-dont-need-a-proprietary-api-for-static-maps%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share You don't need a proprietary API for static maps on ycombinator" href="https://news.ycombinator.com/submitlink?t=You%20don%27t%20need%20a%20proprietary%20API%20for%20static%20maps&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f21%2fyou-dont-need-a-proprietary-api-for-static-maps%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/index.html b/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/index.html
index 8624ed40a..19a184c64 100644
--- a/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/index.html
+++ b/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Our Blue Machine is changing, but we are not helpless | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="books,environment,marine science,quotes"><meta name=description content="One of my many highlights from Helen Czerski&rsquo;s Blue Machine."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Our Blue Machine is changing, but we are not helpless"><meta property="og:description" content="One of my many highlights from Helen Czerski&rsquo;s Blue Machine."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-11-28T06:40:00+00:00"><meta property="article:modified_time" content="2024-03-12T16:33:31+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Our Blue Machine is changing, but we are not helpless"><meta name=twitter:description content="One of my many highlights from Helen Czerski&rsquo;s Blue Machine."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Our Blue Machine is changing, but we are not helpless","item":"https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Our Blue Machine is changing, but we are not helpless","name":"Our Blue Machine is changing, but we are not helpless","description":"One of my many highlights from Helen Czerski\u0026rsquo;s Blue Machine.","keywords":["books","environment","marine science","quotes"],"articleBody":"Quoting the final chapter of Helen Czerski’s Blue Machine:\nWhile doing the research for almost every story in this book, I found that the latest scientific research papers on each topic started by discussing how those systems are changing. The giant ocean engine will keep turning, but its delicate equilibrium and the ways life is woven through it are not fixed. It must turn, but it doesn’t have to turn like this in every detail. And yet this is a system of huge richness, and physical oceanography and evolution would take a long time to replace that bounty if we lost it. We don’t understand everything about the global ocean, but we certainly do understand enough to know how valuable it is and the most obvious ways in which to protect it. The primary reason for setting out the damage we have inflicted on the blue machine is not to shock. It’s to lift us out of helplessness. As we acquire knowledge, we also acquire the grounds for optimism.\nThere’s so much to love about the book, as there is so much to love about the earth’s Blue Machine. The bulk of the book comprises stories that explain how the ocean works – from the physics and chemistry of seawater through to oceanic messengers, passengers, and voyagers. One quote from the book doesn’t do it justice – just go read it yourself.\n","wordCount":"232","inLanguage":"en","datePublished":"2023-11-28T06:40:00Z","dateModified":"2024-03-12T16:33:31+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Our Blue Machine is changing, but we are not helpless</h1><div class=post-meta><span title='2023-11-28 06:40:00 +0000 UTC'>November 28, 2023</span></div></header><div class=post-content><p>Quoting the final chapter of <a href=https://www.helenczerski.net/books-writing target=_blank rel=noopener>Helen Czerski&rsquo;s Blue Machine</a>:</p><blockquote><p>While doing the research for almost every story in this book, I found that the latest scientific research papers on each topic started by discussing how those systems are changing. The giant ocean engine will keep turning, but its delicate equilibrium and the ways life is woven through it are not fixed. It must turn, but it doesn&rsquo;t have to turn like <em>this</em> in every detail. And yet <em>this</em> is a system of huge richness, and physical oceanography and evolution would take a long time to replace that bounty if we lost it. We don&rsquo;t understand everything about the global ocean, but we certainly do understand enough to know how valuable it is and the most obvious ways in which to protect it. The primary reason for setting out the damage we have inflicted on the blue machine is not to shock. It&rsquo;s to lift us out of helplessness. As we acquire knowledge, we also acquire the grounds for optimism.</p></blockquote><p>There&rsquo;s so much to love about the book, as there is so much to love about the earth&rsquo;s Blue Machine. The bulk of the book comprises stories that explain how the ocean works – from the physics and chemistry of seawater through to oceanic messengers, passengers, and voyagers. One quote from the book doesn&rsquo;t do it justice – just go read it yourself.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/environment/>Environment</a></li><li><a href=https://yanirseroussi.com/tags/marine-science/>Marine Science</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Our Blue Machine is changing, but we are not helpless on x" href="https://x.com/intent/tweet/?text=Our%20Blue%20Machine%20is%20changing%2c%20but%20we%20are%20not%20helpless&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f28%2four-blue-machine-is-changing-but-we-are-not-helpless%2f&amp;hashtags=books%2cenvironment%2cmarinescience%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Our Blue Machine is changing, but we are not helpless on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f28%2four-blue-machine-is-changing-but-we-are-not-helpless%2f&amp;title=Our%20Blue%20Machine%20is%20changing%2c%20but%20we%20are%20not%20helpless&amp;summary=Our%20Blue%20Machine%20is%20changing%2c%20but%20we%20are%20not%20helpless&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f28%2four-blue-machine-is-changing-but-we-are-not-helpless%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Our Blue Machine is changing, but we are not helpless on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f28%2four-blue-machine-is-changing-but-we-are-not-helpless%2f&title=Our%20Blue%20Machine%20is%20changing%2c%20but%20we%20are%20not%20helpless"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Our Blue Machine is changing, but we are not helpless on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f28%2four-blue-machine-is-changing-but-we-are-not-helpless%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Our Blue Machine is changing, but we are not helpless on whatsapp" href="https://api.whatsapp.com/send?text=Our%20Blue%20Machine%20is%20changing%2c%20but%20we%20are%20not%20helpless%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f28%2four-blue-machine-is-changing-but-we-are-not-helpless%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Our Blue Machine is changing, but we are not helpless on telegram" href="https://telegram.me/share/url?text=Our%20Blue%20Machine%20is%20changing%2c%20but%20we%20are%20not%20helpless&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f28%2four-blue-machine-is-changing-but-we-are-not-helpless%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Our Blue Machine is changing, but we are not helpless on ycombinator" href="https://news.ycombinator.com/submitlink?t=Our%20Blue%20Machine%20is%20changing%2c%20but%20we%20are%20not%20helpless&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f28%2four-blue-machine-is-changing-but-we-are-not-helpless%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="books,environment,marine science,quotes"><meta name=description content="One of my many highlights from Helen Czerski&rsquo;s Blue Machine."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Our Blue Machine is changing, but we are not helpless"><meta property="og:description" content="One of my many highlights from Helen Czerski&rsquo;s Blue Machine."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-11-28T06:40:00+00:00"><meta property="article:modified_time" content="2024-03-12T16:33:31+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Our Blue Machine is changing, but we are not helpless"><meta name=twitter:description content="One of my many highlights from Helen Czerski&rsquo;s Blue Machine."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Our Blue Machine is changing, but we are not helpless","item":"https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Our Blue Machine is changing, but we are not helpless","name":"Our Blue Machine is changing, but we are not helpless","description":"One of my many highlights from Helen Czerski\u0026rsquo;s Blue Machine.","keywords":["books","environment","marine science","quotes"],"articleBody":"Quoting the final chapter of Helen Czerski’s Blue Machine:\nWhile doing the research for almost every story in this book, I found that the latest scientific research papers on each topic started by discussing how those systems are changing. The giant ocean engine will keep turning, but its delicate equilibrium and the ways life is woven through it are not fixed. It must turn, but it doesn’t have to turn like this in every detail. And yet this is a system of huge richness, and physical oceanography and evolution would take a long time to replace that bounty if we lost it. We don’t understand everything about the global ocean, but we certainly do understand enough to know how valuable it is and the most obvious ways in which to protect it. The primary reason for setting out the damage we have inflicted on the blue machine is not to shock. It’s to lift us out of helplessness. As we acquire knowledge, we also acquire the grounds for optimism.\nThere’s so much to love about the book, as there is so much to love about the earth’s Blue Machine. The bulk of the book comprises stories that explain how the ocean works – from the physics and chemistry of seawater through to oceanic messengers, passengers, and voyagers. One quote from the book doesn’t do it justice – just go read it yourself.\n","wordCount":"232","inLanguage":"en","datePublished":"2023-11-28T06:40:00Z","dateModified":"2024-03-12T16:33:31+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Our Blue Machine is changing, but we are not helpless</h1><div class=post-meta><span title='2023-11-28 06:40:00 +0000 UTC'>November 28, 2023</span></div></header><div class=post-content><p>Quoting the final chapter of <a href=https://www.helenczerski.net/books-writing target=_blank rel=noopener>Helen Czerski&rsquo;s Blue Machine</a>:</p><blockquote><p>While doing the research for almost every story in this book, I found that the latest scientific research papers on each topic started by discussing how those systems are changing. The giant ocean engine will keep turning, but its delicate equilibrium and the ways life is woven through it are not fixed. It must turn, but it doesn&rsquo;t have to turn like <em>this</em> in every detail. And yet <em>this</em> is a system of huge richness, and physical oceanography and evolution would take a long time to replace that bounty if we lost it. We don&rsquo;t understand everything about the global ocean, but we certainly do understand enough to know how valuable it is and the most obvious ways in which to protect it. The primary reason for setting out the damage we have inflicted on the blue machine is not to shock. It&rsquo;s to lift us out of helplessness. As we acquire knowledge, we also acquire the grounds for optimism.</p></blockquote><p>There&rsquo;s so much to love about the book, as there is so much to love about the earth&rsquo;s Blue Machine. The bulk of the book comprises stories that explain how the ocean works – from the physics and chemistry of seawater through to oceanic messengers, passengers, and voyagers. One quote from the book doesn&rsquo;t do it justice – just go read it yourself.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/environment/>Environment</a></li><li><a href=https://yanirseroussi.com/tags/marine-science/>Marine Science</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Our Blue Machine is changing, but we are not helpless on x" href="https://x.com/intent/tweet/?text=Our%20Blue%20Machine%20is%20changing%2c%20but%20we%20are%20not%20helpless&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f28%2four-blue-machine-is-changing-but-we-are-not-helpless%2f&amp;hashtags=books%2cenvironment%2cmarinescience%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Our Blue Machine is changing, but we are not helpless on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f28%2four-blue-machine-is-changing-but-we-are-not-helpless%2f&amp;title=Our%20Blue%20Machine%20is%20changing%2c%20but%20we%20are%20not%20helpless&amp;summary=Our%20Blue%20Machine%20is%20changing%2c%20but%20we%20are%20not%20helpless&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f28%2four-blue-machine-is-changing-but-we-are-not-helpless%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Our Blue Machine is changing, but we are not helpless on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f28%2four-blue-machine-is-changing-but-we-are-not-helpless%2f&title=Our%20Blue%20Machine%20is%20changing%2c%20but%20we%20are%20not%20helpless"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Our Blue Machine is changing, but we are not helpless on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f28%2four-blue-machine-is-changing-but-we-are-not-helpless%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Our Blue Machine is changing, but we are not helpless on whatsapp" href="https://api.whatsapp.com/send?text=Our%20Blue%20Machine%20is%20changing%2c%20but%20we%20are%20not%20helpless%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f28%2four-blue-machine-is-changing-but-we-are-not-helpless%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Our Blue Machine is changing, but we are not helpless on telegram" href="https://telegram.me/share/url?text=Our%20Blue%20Machine%20is%20changing%2c%20but%20we%20are%20not%20helpless&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f28%2four-blue-machine-is-changing-but-we-are-not-helpless%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Our Blue Machine is changing, but we are not helpless on ycombinator" href="https://news.ycombinator.com/submitlink?t=Our%20Blue%20Machine%20is%20changing%2c%20but%20we%20are%20not%20helpless&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f11%2f28%2four-blue-machine-is-changing-but-we-are-not-helpless%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/index.html b/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/index.html
index f091ca85b..783fee41f 100644
--- a/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/index.html
+++ b/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Transfer learning applies to energy market bidding | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="energy markets,machine learning,quotes"><meta name=description content="An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Transfer learning applies to energy market bidding"><meta property="og:description" content="An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-12-14T00:15:00+00:00"><meta property="article:modified_time" content="2023-12-14T10:46:41+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Transfer learning applies to energy market bidding"><meta name=twitter:description content="An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Transfer learning applies to energy market bidding","item":"https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Transfer learning applies to energy market bidding","name":"Transfer learning applies to energy market bidding","description":"An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland.","keywords":["energy markets","machine learning","quotes"],"articleBody":"Quoting the abstract of Transferable Energy Storage Bidder by Yousuf Baker, Ningkun Zheng, and Bolun Xu:\nEnergy storage resources must consider both price uncertainties and their physical operating characteristics when participating in wholesale electricity markets. This is a challenging problem as electricity prices are highly volatile, and energy storage has efficiency losses, power, and energy constraints. This paper presents a novel, versatile, and transferable approach combining model-based optimization with a convolutional long short-term memory network for energy storage to respond to or bid into wholesale electricity markets. We test our proposed approach using historical prices from New York State, showing it achieves state-of-the-art results, achieving between 70% to near 90% profit ratio compared to perfect foresight cases, in both price response and wholesale market bidding setting with various energy storage durations. We also test a transfer learning approach by pre-training the bidding model using New York data and applying it to arbitrage in Queensland, Australia. The result shows transfer learning achieves exceptional arbitrage profitability with as little as three days of local training data, demonstrating its significant advantage over training from scratch in scenarios with very limited data availability.\nI’m not sure about the practical implications, but it’s interesting that data from New York is informative for Queensland. I also found the approach of predicting the opportunity value function (rather than forecasting prices) to be clever. However, I’m not familiar with research in the field, so this may be standard practice. Still, I will refer back to this paper and its references if I go deeper into energy market bidding.\n","wordCount":"260","inLanguage":"en","datePublished":"2023-12-14T00:15:00Z","dateModified":"2023-12-14T10:46:41+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Transfer learning applies to energy market bidding</h1><div class=post-meta><span title='2023-12-14 00:15:00 +0000 UTC'>December 14, 2023</span></div></header><div class=post-content><p>Quoting the abstract of <a href=https://arxiv.org/abs/2301.01233 target=_blank rel=noopener>Transferable Energy Storage Bidder</a> by Yousuf Baker, Ningkun Zheng, and Bolun Xu:</p><blockquote><p>Energy storage resources must consider both price uncertainties and their physical operating characteristics when participating in wholesale electricity markets. This is a challenging problem as electricity prices are highly volatile, and energy storage has efficiency losses, power, and energy constraints. This paper presents a novel, versatile, and transferable approach combining model-based optimization with a convolutional long short-term memory network for energy storage to respond to or bid into wholesale electricity markets. We test our proposed approach using historical prices from New York State, showing it achieves state-of-the-art results, achieving between 70% to near 90% profit ratio compared to perfect foresight cases, in both price response and wholesale market bidding setting with various energy storage durations. We also test a transfer learning approach by pre-training the bidding model using New York data and applying it to arbitrage in Queensland, Australia. The result shows transfer learning achieves exceptional arbitrage profitability with as little as three days of local training data, demonstrating its significant advantage over training from scratch in scenarios with very limited data availability.</p></blockquote><p>I&rsquo;m not sure about the practical implications, but it&rsquo;s interesting that data from New York is informative for Queensland. I also found the approach of predicting the opportunity value function (rather than forecasting prices) to be clever. However, I&rsquo;m not familiar with research in the field, so this may be standard practice. Still, I will refer back to this paper and its references if I go deeper into energy market bidding.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/energy-markets/>Energy Markets</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Transfer learning applies to energy market bidding on x" href="https://x.com/intent/tweet/?text=Transfer%20learning%20applies%20to%20energy%20market%20bidding&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f14%2ftransfer-learning-applies-to-energy-market-bidding%2f&amp;hashtags=energymarkets%2cmachinelearning%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Transfer learning applies to energy market bidding on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f14%2ftransfer-learning-applies-to-energy-market-bidding%2f&amp;title=Transfer%20learning%20applies%20to%20energy%20market%20bidding&amp;summary=Transfer%20learning%20applies%20to%20energy%20market%20bidding&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f14%2ftransfer-learning-applies-to-energy-market-bidding%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Transfer learning applies to energy market bidding on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f14%2ftransfer-learning-applies-to-energy-market-bidding%2f&title=Transfer%20learning%20applies%20to%20energy%20market%20bidding"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Transfer learning applies to energy market bidding on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f14%2ftransfer-learning-applies-to-energy-market-bidding%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Transfer learning applies to energy market bidding on whatsapp" href="https://api.whatsapp.com/send?text=Transfer%20learning%20applies%20to%20energy%20market%20bidding%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f14%2ftransfer-learning-applies-to-energy-market-bidding%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Transfer learning applies to energy market bidding on telegram" href="https://telegram.me/share/url?text=Transfer%20learning%20applies%20to%20energy%20market%20bidding&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f14%2ftransfer-learning-applies-to-energy-market-bidding%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Transfer learning applies to energy market bidding on ycombinator" href="https://news.ycombinator.com/submitlink?t=Transfer%20learning%20applies%20to%20energy%20market%20bidding&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f14%2ftransfer-learning-applies-to-energy-market-bidding%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="energy markets,machine learning,quotes"><meta name=description content="An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Transfer learning applies to energy market bidding"><meta property="og:description" content="An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-12-14T00:15:00+00:00"><meta property="article:modified_time" content="2023-12-14T10:46:41+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Transfer learning applies to energy market bidding"><meta name=twitter:description content="An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Transfer learning applies to energy market bidding","item":"https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Transfer learning applies to energy market bidding","name":"Transfer learning applies to energy market bidding","description":"An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland.","keywords":["energy markets","machine learning","quotes"],"articleBody":"Quoting the abstract of Transferable Energy Storage Bidder by Yousuf Baker, Ningkun Zheng, and Bolun Xu:\nEnergy storage resources must consider both price uncertainties and their physical operating characteristics when participating in wholesale electricity markets. This is a challenging problem as electricity prices are highly volatile, and energy storage has efficiency losses, power, and energy constraints. This paper presents a novel, versatile, and transferable approach combining model-based optimization with a convolutional long short-term memory network for energy storage to respond to or bid into wholesale electricity markets. We test our proposed approach using historical prices from New York State, showing it achieves state-of-the-art results, achieving between 70% to near 90% profit ratio compared to perfect foresight cases, in both price response and wholesale market bidding setting with various energy storage durations. We also test a transfer learning approach by pre-training the bidding model using New York data and applying it to arbitrage in Queensland, Australia. The result shows transfer learning achieves exceptional arbitrage profitability with as little as three days of local training data, demonstrating its significant advantage over training from scratch in scenarios with very limited data availability.\nI’m not sure about the practical implications, but it’s interesting that data from New York is informative for Queensland. I also found the approach of predicting the opportunity value function (rather than forecasting prices) to be clever. However, I’m not familiar with research in the field, so this may be standard practice. Still, I will refer back to this paper and its references if I go deeper into energy market bidding.\n","wordCount":"260","inLanguage":"en","datePublished":"2023-12-14T00:15:00Z","dateModified":"2023-12-14T10:46:41+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Transfer learning applies to energy market bidding</h1><div class=post-meta><span title='2023-12-14 00:15:00 +0000 UTC'>December 14, 2023</span></div></header><div class=post-content><p>Quoting the abstract of <a href=https://arxiv.org/abs/2301.01233 target=_blank rel=noopener>Transferable Energy Storage Bidder</a> by Yousuf Baker, Ningkun Zheng, and Bolun Xu:</p><blockquote><p>Energy storage resources must consider both price uncertainties and their physical operating characteristics when participating in wholesale electricity markets. This is a challenging problem as electricity prices are highly volatile, and energy storage has efficiency losses, power, and energy constraints. This paper presents a novel, versatile, and transferable approach combining model-based optimization with a convolutional long short-term memory network for energy storage to respond to or bid into wholesale electricity markets. We test our proposed approach using historical prices from New York State, showing it achieves state-of-the-art results, achieving between 70% to near 90% profit ratio compared to perfect foresight cases, in both price response and wholesale market bidding setting with various energy storage durations. We also test a transfer learning approach by pre-training the bidding model using New York data and applying it to arbitrage in Queensland, Australia. The result shows transfer learning achieves exceptional arbitrage profitability with as little as three days of local training data, demonstrating its significant advantage over training from scratch in scenarios with very limited data availability.</p></blockquote><p>I&rsquo;m not sure about the practical implications, but it&rsquo;s interesting that data from New York is informative for Queensland. I also found the approach of predicting the opportunity value function (rather than forecasting prices) to be clever. However, I&rsquo;m not familiar with research in the field, so this may be standard practice. Still, I will refer back to this paper and its references if I go deeper into energy market bidding.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/energy-markets/>Energy Markets</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Transfer learning applies to energy market bidding on x" href="https://x.com/intent/tweet/?text=Transfer%20learning%20applies%20to%20energy%20market%20bidding&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f14%2ftransfer-learning-applies-to-energy-market-bidding%2f&amp;hashtags=energymarkets%2cmachinelearning%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Transfer learning applies to energy market bidding on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f14%2ftransfer-learning-applies-to-energy-market-bidding%2f&amp;title=Transfer%20learning%20applies%20to%20energy%20market%20bidding&amp;summary=Transfer%20learning%20applies%20to%20energy%20market%20bidding&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f14%2ftransfer-learning-applies-to-energy-market-bidding%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Transfer learning applies to energy market bidding on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f14%2ftransfer-learning-applies-to-energy-market-bidding%2f&title=Transfer%20learning%20applies%20to%20energy%20market%20bidding"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Transfer learning applies to energy market bidding on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f14%2ftransfer-learning-applies-to-energy-market-bidding%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Transfer learning applies to energy market bidding on whatsapp" href="https://api.whatsapp.com/send?text=Transfer%20learning%20applies%20to%20energy%20market%20bidding%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f14%2ftransfer-learning-applies-to-energy-market-bidding%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Transfer learning applies to energy market bidding on telegram" href="https://telegram.me/share/url?text=Transfer%20learning%20applies%20to%20energy%20market%20bidding&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f14%2ftransfer-learning-applies-to-energy-market-bidding%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Transfer learning applies to energy market bidding on ycombinator" href="https://news.ycombinator.com/submitlink?t=Transfer%20learning%20applies%20to%20energy%20market%20bidding&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f14%2ftransfer-learning-applies-to-energy-market-bidding%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/index.html b/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/index.html
index 9ae2c4325..daca99580 100644
--- a/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/index.html
+++ b/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Positioning is a common problem for data scientists | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,career,data business,data science"><meta name=description content="With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Positioning is a common problem for data scientists"><meta property="og:description" content="With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-12-18T00:30:00+00:00"><meta property="article:modified_time" content="2023-12-18T10:38:56+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Positioning is a common problem for data scientists"><meta name=twitter:description content="With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Positioning is a common problem for data scientists","item":"https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Positioning is a common problem for data scientists","name":"Positioning is a common problem for data scientists","description":"With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark.","keywords":["business","career","data business","data science"],"articleBody":"I became a data scientist by accident: I followed my curiosity and did a PhD in computational linguistics and recommender systems. When I finished my PhD in 2012, I discovered I could call myself a data scientist rather than a software engineer with a research background (which was a bit of a mouthful). As 2012 was the year Harvard Business Review declared data scientist to be the sexiest job of the 21st century, I didn’t need to think much about what kind of data scientist I was. Just being a data scientist was pretty unique.\nThe world has changed in the past eleven years, and now there are many more data scientists. While you could earn good money as a generic data scientist, you don’t stand out. That is, it’s not only that software commodities are replacing interesting data science work, and that large language models are making some skills irrelevant – whatever is left of the core data science skillset has become an undifferentiated commodity.\nI’ve been thinking a lot about positioning as an independent consultant recently, after realising that the lines between solo consulting and product building are blurry. One great source to learn more on the topic is Jonathan Stark, who has published many valuable resources over the years. Among them, I found a podcast interview he did in May this year with Genevieve Hayes, titled Building Your Authority in Data Science.\nWhether you’re an employee or independent data scientist, it’s worth listening to the interview. Here are my key takeaways:\nEven though data scientists are already highly specialised, the problem of positioning oneself and standing out is common. Understanding marketing and the business side in addition to mastering the technical skills can be a superpower, as you can act as a bridge between non-technical people and the “nerds”. You need to be perceived as meaningfully different by your target audience, regardless of whether you choose to specialise in a horizontal (specific data science skill like computer vision) or in a vertical (specific industry like renewable energy). If your target audience doesn’t find you meaningfully different, you have more work to do. Avoid basing your self-worth on where you sit compared to other data scientists. If you’re good enough technically (C to B+) and you have complementary skills and an outcome-driven mindset, you’d be unstoppable. This still seems rare. Publish stuff that business people can understand, i.e., connect what you can do on the technical side with business value. You don’t need to be managing people to deliver results, e.g., Jonathan chose to remain solo and not hire employees. Focusing on business results is what matters. At the time of the interview (May 2023), Jonathan was searching for a ChatGPT consultant to learn whether he could turn his content into a chatbot. He was surprised he could barely find anyone. This is a good example of riding a hype cycle, as being an early authority on ChatGPT can lead to solid business outcomes for indie consultants. However, given the nature of hype cycles, this can change quickly. Do things you’re deeply curious about, as enthusiasm helps you stand out. Going down rabbit holes can be a strength. In the context of ChatGPT consulting, this reminded me of Simon Willison and Ethan Mollick, who have become more well-known recently due to their curiosity and blogging on generative AI. There are many ways to make money, so you might as well work on something you like. Personally, I’m still figuring out my positioning. I found it interesting that Genevieve and Jonathan agreed that stereotypical data scientists care more about building their models than about how they’re used. While I enjoy the technical aspects of modelling and other data science tasks, I’m much more interested in shipping work that matters. That’s why I became a data scientist originally – I left academia after my PhD and joined startups to get close to business problems and build stuff people use. If that’s still a rarity, I suppose it can help with my positioning. That said, I’m also exploring a deeper vertical specialisation (currently looking at energy markets).\n","wordCount":"686","inLanguage":"en","datePublished":"2023-12-18T00:30:00Z","dateModified":"2023-12-18T10:38:56+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Positioning is a common problem for data scientists</h1><div class=post-meta><span title='2023-12-18 00:30:00 +0000 UTC'>December 18, 2023</span></div></header><div class=post-content><p>I <a href=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/>became a data scientist by accident</a>: I followed my curiosity and did a PhD in computational linguistics and recommender systems. When I finished my PhD in 2012, I discovered I could call myself a data scientist rather than <em>a software engineer with a research background</em> (which was a bit of a mouthful). As 2012 was the year <a href=https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century target=_blank rel=noopener>Harvard Business Review declared data scientist to be the sexiest job of the 21st century</a>, I didn&rsquo;t need to think much about what kind of data scientist I was. Just <em>being</em> a data scientist was pretty unique.</p><p>The world has changed in the past eleven years, and now there are many more data scientists. While you could earn good money as a generic data scientist, you don&rsquo;t stand out. That is, it&rsquo;s not only that <a href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/>software commodities are replacing interesting data science work</a>, and that <a href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/>large language models are making some skills irrelevant</a> – whatever is left of the core data science skillset has become an undifferentiated commodity.</p><p>I&rsquo;ve been thinking a lot about positioning as an independent consultant recently, after realising that <a href=https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/>the lines between solo consulting and product building are blurry</a>. One great source to learn more on the topic is <a href=https://jonathanstark.com/ target=_blank rel=noopener>Jonathan Stark</a>, who has published many valuable resources over the years. Among them, I found a podcast interview he did in May this year with Genevieve Hayes, titled <a href=https://www.genevievehayes.com/podcast/ep14/ target=_blank rel=noopener>Building Your Authority in Data Science</a>.</p><p>Whether you&rsquo;re an employee or independent data scientist, it&rsquo;s worth listening to the interview. Here are my key takeaways:</p><ul><li>Even though data scientists are already highly specialised, the problem of positioning oneself and standing out is common.</li><li>Understanding marketing and the business side in addition to mastering the technical skills can be a superpower, as you can act as a bridge between non-technical people and the &ldquo;nerds&rdquo;.</li><li>You need to be perceived as meaningfully different <em>by your target audience</em>, regardless of whether you choose to specialise in a horizontal (specific data science skill like computer vision) or in a vertical (specific industry like renewable energy). If your target audience doesn&rsquo;t find you meaningfully different, you have more work to do.</li><li>Avoid basing your self-worth on where you sit compared to other data scientists. If you&rsquo;re good enough technically (C to B+) and you have complementary skills and an outcome-driven mindset, you&rsquo;d be unstoppable. This still seems rare.</li><li>Publish stuff that business people can understand, i.e., connect what you can do on the technical side with business value.</li><li>You don&rsquo;t need to be managing people to deliver results, e.g., Jonathan chose to remain solo and not hire employees. Focusing on business results is what matters.</li><li>At the time of the interview (May 2023), Jonathan was searching for a ChatGPT consultant to learn whether he could turn his content into a chatbot. He was surprised he could barely find anyone. This is a good example of riding a hype cycle, as being an early authority on ChatGPT can lead to solid business outcomes for indie consultants. However, given the nature of hype cycles, this can change quickly.</li><li>Do things you&rsquo;re deeply curious about, as enthusiasm helps you stand out. Going down rabbit holes can be a strength. In the context of ChatGPT consulting, this reminded me of <a href=https://simonwillison.net/ target=_blank rel=noopener>Simon Willison</a> and <a href=https://www.oneusefulthing.org/ target=_blank rel=noopener>Ethan Mollick</a>, who have become more well-known recently due to their curiosity and blogging on generative AI.</li><li>There are many ways to make money, so you might as well work on something you like.</li></ul><p>Personally, I&rsquo;m still figuring out my positioning. I found it interesting that Genevieve and Jonathan agreed that stereotypical data scientists care more about building their models than about how they&rsquo;re used. While I enjoy the technical aspects of modelling and other data science tasks, I&rsquo;m much more interested in shipping work that matters. That&rsquo;s why I became a data scientist originally – I left academia after my PhD and joined startups to get close to business problems and build stuff people use. If that&rsquo;s still a rarity, I suppose it can help with my positioning. That said, I&rsquo;m also exploring a deeper vertical specialisation (currently looking at energy markets).</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-business/>Data Business</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Positioning is a common problem for data scientists on x" href="https://x.com/intent/tweet/?text=Positioning%20is%20a%20common%20problem%20for%20data%20scientists&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f18%2fpositioning-is-a-common-problem-for-data-scientists%2f&amp;hashtags=business%2ccareer%2cdatabusiness%2cdatascience"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Positioning is a common problem for data scientists on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f18%2fpositioning-is-a-common-problem-for-data-scientists%2f&amp;title=Positioning%20is%20a%20common%20problem%20for%20data%20scientists&amp;summary=Positioning%20is%20a%20common%20problem%20for%20data%20scientists&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f18%2fpositioning-is-a-common-problem-for-data-scientists%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Positioning is a common problem for data scientists on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f18%2fpositioning-is-a-common-problem-for-data-scientists%2f&title=Positioning%20is%20a%20common%20problem%20for%20data%20scientists"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Positioning is a common problem for data scientists on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f18%2fpositioning-is-a-common-problem-for-data-scientists%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Positioning is a common problem for data scientists on whatsapp" href="https://api.whatsapp.com/send?text=Positioning%20is%20a%20common%20problem%20for%20data%20scientists%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f18%2fpositioning-is-a-common-problem-for-data-scientists%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Positioning is a common problem for data scientists on telegram" href="https://telegram.me/share/url?text=Positioning%20is%20a%20common%20problem%20for%20data%20scientists&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f18%2fpositioning-is-a-common-problem-for-data-scientists%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Positioning is a common problem for data scientists on ycombinator" href="https://news.ycombinator.com/submitlink?t=Positioning%20is%20a%20common%20problem%20for%20data%20scientists&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f18%2fpositioning-is-a-common-problem-for-data-scientists%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="business,career,data business,data science"><meta name=description content="With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Positioning is a common problem for data scientists"><meta property="og:description" content="With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/"><meta property="article:section" content="til"><meta property="article:published_time" content="2023-12-18T00:30:00+00:00"><meta property="article:modified_time" content="2023-12-18T10:38:56+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Positioning is a common problem for data scientists"><meta name=twitter:description content="With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Positioning is a common problem for data scientists","item":"https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Positioning is a common problem for data scientists","name":"Positioning is a common problem for data scientists","description":"With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark.","keywords":["business","career","data business","data science"],"articleBody":"I became a data scientist by accident: I followed my curiosity and did a PhD in computational linguistics and recommender systems. When I finished my PhD in 2012, I discovered I could call myself a data scientist rather than a software engineer with a research background (which was a bit of a mouthful). As 2012 was the year Harvard Business Review declared data scientist to be the sexiest job of the 21st century, I didn’t need to think much about what kind of data scientist I was. Just being a data scientist was pretty unique.\nThe world has changed in the past eleven years, and now there are many more data scientists. While you could earn good money as a generic data scientist, you don’t stand out. That is, it’s not only that software commodities are replacing interesting data science work, and that large language models are making some skills irrelevant – whatever is left of the core data science skillset has become an undifferentiated commodity.\nI’ve been thinking a lot about positioning as an independent consultant recently, after realising that the lines between solo consulting and product building are blurry. One great source to learn more on the topic is Jonathan Stark, who has published many valuable resources over the years. Among them, I found a podcast interview he did in May this year with Genevieve Hayes, titled Building Your Authority in Data Science.\nWhether you’re an employee or independent data scientist, it’s worth listening to the interview. Here are my key takeaways:\nEven though data scientists are already highly specialised, the problem of positioning oneself and standing out is common. Understanding marketing and the business side in addition to mastering the technical skills can be a superpower, as you can act as a bridge between non-technical people and the “nerds”. You need to be perceived as meaningfully different by your target audience, regardless of whether you choose to specialise in a horizontal (specific data science skill like computer vision) or in a vertical (specific industry like renewable energy). If your target audience doesn’t find you meaningfully different, you have more work to do. Avoid basing your self-worth on where you sit compared to other data scientists. If you’re good enough technically (C to B+) and you have complementary skills and an outcome-driven mindset, you’d be unstoppable. This still seems rare. Publish stuff that business people can understand, i.e., connect what you can do on the technical side with business value. You don’t need to be managing people to deliver results, e.g., Jonathan chose to remain solo and not hire employees. Focusing on business results is what matters. At the time of the interview (May 2023), Jonathan was searching for a ChatGPT consultant to learn whether he could turn his content into a chatbot. He was surprised he could barely find anyone. This is a good example of riding a hype cycle, as being an early authority on ChatGPT can lead to solid business outcomes for indie consultants. However, given the nature of hype cycles, this can change quickly. Do things you’re deeply curious about, as enthusiasm helps you stand out. Going down rabbit holes can be a strength. In the context of ChatGPT consulting, this reminded me of Simon Willison and Ethan Mollick, who have become more well-known recently due to their curiosity and blogging on generative AI. There are many ways to make money, so you might as well work on something you like. Personally, I’m still figuring out my positioning. I found it interesting that Genevieve and Jonathan agreed that stereotypical data scientists care more about building their models than about how they’re used. While I enjoy the technical aspects of modelling and other data science tasks, I’m much more interested in shipping work that matters. That’s why I became a data scientist originally – I left academia after my PhD and joined startups to get close to business problems and build stuff people use. If that’s still a rarity, I suppose it can help with my positioning. That said, I’m also exploring a deeper vertical specialisation (currently looking at energy markets).\n","wordCount":"686","inLanguage":"en","datePublished":"2023-12-18T00:30:00Z","dateModified":"2023-12-18T10:38:56+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Positioning is a common problem for data scientists</h1><div class=post-meta><span title='2023-12-18 00:30:00 +0000 UTC'>December 18, 2023</span></div></header><div class=post-content><p>I <a href=https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/>became a data scientist by accident</a>: I followed my curiosity and did a PhD in computational linguistics and recommender systems. When I finished my PhD in 2012, I discovered I could call myself a data scientist rather than <em>a software engineer with a research background</em> (which was a bit of a mouthful). As 2012 was the year <a href=https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century target=_blank rel=noopener>Harvard Business Review declared data scientist to be the sexiest job of the 21st century</a>, I didn&rsquo;t need to think much about what kind of data scientist I was. Just <em>being</em> a data scientist was pretty unique.</p><p>The world has changed in the past eleven years, and now there are many more data scientists. While you could earn good money as a generic data scientist, you don&rsquo;t stand out. That is, it&rsquo;s not only that <a href=https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/>software commodities are replacing interesting data science work</a>, and that <a href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/>large language models are making some skills irrelevant</a> – whatever is left of the core data science skillset has become an undifferentiated commodity.</p><p>I&rsquo;ve been thinking a lot about positioning as an independent consultant recently, after realising that <a href=https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/>the lines between solo consulting and product building are blurry</a>. One great source to learn more on the topic is <a href=https://jonathanstark.com/ target=_blank rel=noopener>Jonathan Stark</a>, who has published many valuable resources over the years. Among them, I found a podcast interview he did in May this year with Genevieve Hayes, titled <a href=https://www.genevievehayes.com/podcast/ep14/ target=_blank rel=noopener>Building Your Authority in Data Science</a>.</p><p>Whether you&rsquo;re an employee or independent data scientist, it&rsquo;s worth listening to the interview. Here are my key takeaways:</p><ul><li>Even though data scientists are already highly specialised, the problem of positioning oneself and standing out is common.</li><li>Understanding marketing and the business side in addition to mastering the technical skills can be a superpower, as you can act as a bridge between non-technical people and the &ldquo;nerds&rdquo;.</li><li>You need to be perceived as meaningfully different <em>by your target audience</em>, regardless of whether you choose to specialise in a horizontal (specific data science skill like computer vision) or in a vertical (specific industry like renewable energy). If your target audience doesn&rsquo;t find you meaningfully different, you have more work to do.</li><li>Avoid basing your self-worth on where you sit compared to other data scientists. If you&rsquo;re good enough technically (C to B+) and you have complementary skills and an outcome-driven mindset, you&rsquo;d be unstoppable. This still seems rare.</li><li>Publish stuff that business people can understand, i.e., connect what you can do on the technical side with business value.</li><li>You don&rsquo;t need to be managing people to deliver results, e.g., Jonathan chose to remain solo and not hire employees. Focusing on business results is what matters.</li><li>At the time of the interview (May 2023), Jonathan was searching for a ChatGPT consultant to learn whether he could turn his content into a chatbot. He was surprised he could barely find anyone. This is a good example of riding a hype cycle, as being an early authority on ChatGPT can lead to solid business outcomes for indie consultants. However, given the nature of hype cycles, this can change quickly.</li><li>Do things you&rsquo;re deeply curious about, as enthusiasm helps you stand out. Going down rabbit holes can be a strength. In the context of ChatGPT consulting, this reminded me of <a href=https://simonwillison.net/ target=_blank rel=noopener>Simon Willison</a> and <a href=https://www.oneusefulthing.org/ target=_blank rel=noopener>Ethan Mollick</a>, who have become more well-known recently due to their curiosity and blogging on generative AI.</li><li>There are many ways to make money, so you might as well work on something you like.</li></ul><p>Personally, I&rsquo;m still figuring out my positioning. I found it interesting that Genevieve and Jonathan agreed that stereotypical data scientists care more about building their models than about how they&rsquo;re used. While I enjoy the technical aspects of modelling and other data science tasks, I&rsquo;m much more interested in shipping work that matters. That&rsquo;s why I became a data scientist originally – I left academia after my PhD and joined startups to get close to business problems and build stuff people use. If that&rsquo;s still a rarity, I suppose it can help with my positioning. That said, I&rsquo;m also exploring a deeper vertical specialisation (currently looking at energy markets).</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-business/>Data Business</a></li><li><a href=https://yanirseroussi.com/tags/data-science/>Data Science</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Positioning is a common problem for data scientists on x" href="https://x.com/intent/tweet/?text=Positioning%20is%20a%20common%20problem%20for%20data%20scientists&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f18%2fpositioning-is-a-common-problem-for-data-scientists%2f&amp;hashtags=business%2ccareer%2cdatabusiness%2cdatascience"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Positioning is a common problem for data scientists on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f18%2fpositioning-is-a-common-problem-for-data-scientists%2f&amp;title=Positioning%20is%20a%20common%20problem%20for%20data%20scientists&amp;summary=Positioning%20is%20a%20common%20problem%20for%20data%20scientists&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f18%2fpositioning-is-a-common-problem-for-data-scientists%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Positioning is a common problem for data scientists on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f18%2fpositioning-is-a-common-problem-for-data-scientists%2f&title=Positioning%20is%20a%20common%20problem%20for%20data%20scientists"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Positioning is a common problem for data scientists on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f18%2fpositioning-is-a-common-problem-for-data-scientists%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Positioning is a common problem for data scientists on whatsapp" href="https://api.whatsapp.com/send?text=Positioning%20is%20a%20common%20problem%20for%20data%20scientists%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f18%2fpositioning-is-a-common-problem-for-data-scientists%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Positioning is a common problem for data scientists on telegram" href="https://telegram.me/share/url?text=Positioning%20is%20a%20common%20problem%20for%20data%20scientists&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f18%2fpositioning-is-a-common-problem-for-data-scientists%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Positioning is a common problem for data scientists on ycombinator" href="https://news.ycombinator.com/submitlink?t=Positioning%20is%20a%20common%20problem%20for%20data%20scientists&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2023%2f12%2f18%2fpositioning-is-a-common-problem-for-data-scientists%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2024/01/08/the-power-of-parasocial-relationships/index.html b/til/2024/01/08/the-power-of-parasocial-relationships/index.html
index 4dcff0922..bdee4afdb 100644
--- a/til/2024/01/08/the-power-of-parasocial-relationships/index.html
+++ b/til/2024/01/08/the-power-of-parasocial-relationships/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>The power of parasocial relationships | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,career,marketing"><meta name=description content="Repeated exposure to media personas creates relationships that help justify premium fees."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The power of parasocial relationships"><meta property="og:description" content="Repeated exposure to media personas creates relationships that help justify premium fees."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-01-08T06:00:00+00:00"><meta property="article:modified_time" content="2024-01-08T16:31:22+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="The power of parasocial relationships"><meta name=twitter:description content="Repeated exposure to media personas creates relationships that help justify premium fees."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"The power of parasocial relationships","item":"https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The power of parasocial relationships","name":"The power of parasocial relationships","description":"Repeated exposure to media personas creates relationships that help justify premium fees.","keywords":["business","career","marketing"],"articleBody":"I recently learned about the term parasocial relationship from The Business of Authority episode on The “Secret” Benefit of Podcasting. Quoting Wikipedia:\nParasocial interaction (PSI) refers to a kind of psychological relationship experienced by an audience in their mediated encounters with performers in the mass media, particularly on television and on online platforms. Viewers or listeners come to consider media personalities as friends, despite having no or limited interactions with them. PSI is described as an illusory experience, such that media audiences interact with personas (e.g., talk show hosts, celebrities, fictional characters, social media influencers) as if they are engaged in a reciprocal relationship with them. The term was coined by Donald Horton and Richard Wohl in 1956.\nA parasocial interaction, an exposure that garners interest in a persona, becomes a parasocial relationship after repeated exposure to the media persona causes the media user to develop illusions of intimacy, friendship, and identification. Positive information learned about the media persona results in increased attraction, and the relationship progresses. Parasocial relationships are enhanced due to trust and self-disclosure provided by the media persona.\nInterestingly, I feel exactly that sense of familiarity with Rochelle Moulton and Jonathan Stark, hosts of The Business of Authority. This is despite having no direct interactions with them. I only listened to a few episodes, read some of Jonathan’s materials, and signed up to his daily mailing list (which means he’s now a constant presence in my life).\nAs both Rochelle and Jonathan have been putting themselves out there for years, sharing quality content, and creating parasocial relationships with their target audience, they can charge premium fees for their expertise. At the time of this writing, Rochelle charges $1000 for a one-hour coaching session, while Jonathan charges $2500.\nIt’s embarrassingly simple, but incredibly powerful and applicable in a wide range of areas. Or in Jonathan’s words: “It takes time, but if you want to justify premium fees, becoming a celebrity in your space is a great way to grow your profits.”\n","wordCount":"333","inLanguage":"en","datePublished":"2024-01-08T06:00:00Z","dateModified":"2024-01-08T16:31:22+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">The power of parasocial relationships</h1><div class=post-meta><span title='2024-01-08 06:00:00 +0000 UTC'>January 8, 2024</span></div></header><div class=post-content><p>I recently learned about the term <em>parasocial relationship</em> from <a href=https://www.thebusinessofauthority.com/episodes/the-secret-benefit-of-podcasting target=_blank rel=noopener>The Business of Authority episode on The &ldquo;Secret&rdquo; Benefit of Podcasting</a>. <a href=https://en.wikipedia.org/wiki/Parasocial_interaction target=_blank rel=noopener>Quoting Wikipedia</a>:</p><blockquote><p><strong>Parasocial interaction (PSI)</strong> refers to a kind of psychological relationship experienced by an audience in their mediated encounters with performers in the mass media, particularly on television and on online platforms. Viewers or listeners come to consider media personalities as friends, despite having no or limited interactions with them. PSI is described as an illusory experience, such that media audiences interact with personas (e.g., talk show hosts, celebrities, fictional characters, social media influencers) as if they are engaged in a reciprocal relationship with them. The term was coined by Donald Horton and Richard Wohl in 1956.</p><p>A parasocial interaction, an exposure that garners interest in a persona, becomes a <strong>parasocial relationship</strong> after repeated exposure to the media persona causes the media user to develop illusions of intimacy, friendship, and identification. Positive information learned about the media persona results in increased attraction, and the relationship progresses. Parasocial relationships are enhanced due to trust and self-disclosure provided by the media persona.</p></blockquote><p>Interestingly, I feel exactly that sense of familiarity with Rochelle Moulton and Jonathan Stark, hosts of The Business of Authority. This is despite having no direct interactions with them. I only listened to a few episodes, read some of Jonathan&rsquo;s materials, and signed up to his daily mailing list (which means he&rsquo;s now a constant presence in my life).</p><p>As both Rochelle and Jonathan have been putting themselves out there for years, sharing quality content, and creating parasocial relationships with their target audience, they can charge premium fees for their expertise. At the time of this writing, Rochelle charges $1000 for a one-hour coaching session, while Jonathan charges $2500.</p><p>It&rsquo;s embarrassingly simple, but incredibly powerful and applicable in a wide range of areas. Or in <a href=https://jonathanstark.com/daily/20231117-1500-born-famous target=_blank rel=noopener>Jonathan&rsquo;s words</a>: <em>&ldquo;It takes time, but if you want to justify premium fees, becoming a celebrity in your space is a great way to grow your profits.&rdquo;</em></p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The power of parasocial relationships on x" href="https://x.com/intent/tweet/?text=The%20power%20of%20parasocial%20relationships&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f08%2fthe-power-of-parasocial-relationships%2f&amp;hashtags=business%2ccareer%2cmarketing"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The power of parasocial relationships on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f08%2fthe-power-of-parasocial-relationships%2f&amp;title=The%20power%20of%20parasocial%20relationships&amp;summary=The%20power%20of%20parasocial%20relationships&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f08%2fthe-power-of-parasocial-relationships%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The power of parasocial relationships on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f08%2fthe-power-of-parasocial-relationships%2f&title=The%20power%20of%20parasocial%20relationships"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The power of parasocial relationships on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f08%2fthe-power-of-parasocial-relationships%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The power of parasocial relationships on whatsapp" href="https://api.whatsapp.com/send?text=The%20power%20of%20parasocial%20relationships%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f08%2fthe-power-of-parasocial-relationships%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The power of parasocial relationships on telegram" href="https://telegram.me/share/url?text=The%20power%20of%20parasocial%20relationships&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f08%2fthe-power-of-parasocial-relationships%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The power of parasocial relationships on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20power%20of%20parasocial%20relationships&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f08%2fthe-power-of-parasocial-relationships%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="business,career,marketing"><meta name=description content="Repeated exposure to media personas creates relationships that help justify premium fees."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The power of parasocial relationships"><meta property="og:description" content="Repeated exposure to media personas creates relationships that help justify premium fees."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-01-08T06:00:00+00:00"><meta property="article:modified_time" content="2024-01-08T16:31:22+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="The power of parasocial relationships"><meta name=twitter:description content="Repeated exposure to media personas creates relationships that help justify premium fees."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"The power of parasocial relationships","item":"https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The power of parasocial relationships","name":"The power of parasocial relationships","description":"Repeated exposure to media personas creates relationships that help justify premium fees.","keywords":["business","career","marketing"],"articleBody":"I recently learned about the term parasocial relationship from The Business of Authority episode on The “Secret” Benefit of Podcasting. Quoting Wikipedia:\nParasocial interaction (PSI) refers to a kind of psychological relationship experienced by an audience in their mediated encounters with performers in the mass media, particularly on television and on online platforms. Viewers or listeners come to consider media personalities as friends, despite having no or limited interactions with them. PSI is described as an illusory experience, such that media audiences interact with personas (e.g., talk show hosts, celebrities, fictional characters, social media influencers) as if they are engaged in a reciprocal relationship with them. The term was coined by Donald Horton and Richard Wohl in 1956.\nA parasocial interaction, an exposure that garners interest in a persona, becomes a parasocial relationship after repeated exposure to the media persona causes the media user to develop illusions of intimacy, friendship, and identification. Positive information learned about the media persona results in increased attraction, and the relationship progresses. Parasocial relationships are enhanced due to trust and self-disclosure provided by the media persona.\nInterestingly, I feel exactly that sense of familiarity with Rochelle Moulton and Jonathan Stark, hosts of The Business of Authority. This is despite having no direct interactions with them. I only listened to a few episodes, read some of Jonathan’s materials, and signed up to his daily mailing list (which means he’s now a constant presence in my life).\nAs both Rochelle and Jonathan have been putting themselves out there for years, sharing quality content, and creating parasocial relationships with their target audience, they can charge premium fees for their expertise. At the time of this writing, Rochelle charges $1000 for a one-hour coaching session, while Jonathan charges $2500.\nIt’s embarrassingly simple, but incredibly powerful and applicable in a wide range of areas. Or in Jonathan’s words: “It takes time, but if you want to justify premium fees, becoming a celebrity in your space is a great way to grow your profits.”\n","wordCount":"333","inLanguage":"en","datePublished":"2024-01-08T06:00:00Z","dateModified":"2024-01-08T16:31:22+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">The power of parasocial relationships</h1><div class=post-meta><span title='2024-01-08 06:00:00 +0000 UTC'>January 8, 2024</span></div></header><div class=post-content><p>I recently learned about the term <em>parasocial relationship</em> from <a href=https://www.thebusinessofauthority.com/episodes/the-secret-benefit-of-podcasting target=_blank rel=noopener>The Business of Authority episode on The &ldquo;Secret&rdquo; Benefit of Podcasting</a>. <a href=https://en.wikipedia.org/wiki/Parasocial_interaction target=_blank rel=noopener>Quoting Wikipedia</a>:</p><blockquote><p><strong>Parasocial interaction (PSI)</strong> refers to a kind of psychological relationship experienced by an audience in their mediated encounters with performers in the mass media, particularly on television and on online platforms. Viewers or listeners come to consider media personalities as friends, despite having no or limited interactions with them. PSI is described as an illusory experience, such that media audiences interact with personas (e.g., talk show hosts, celebrities, fictional characters, social media influencers) as if they are engaged in a reciprocal relationship with them. The term was coined by Donald Horton and Richard Wohl in 1956.</p><p>A parasocial interaction, an exposure that garners interest in a persona, becomes a <strong>parasocial relationship</strong> after repeated exposure to the media persona causes the media user to develop illusions of intimacy, friendship, and identification. Positive information learned about the media persona results in increased attraction, and the relationship progresses. Parasocial relationships are enhanced due to trust and self-disclosure provided by the media persona.</p></blockquote><p>Interestingly, I feel exactly that sense of familiarity with Rochelle Moulton and Jonathan Stark, hosts of The Business of Authority. This is despite having no direct interactions with them. I only listened to a few episodes, read some of Jonathan&rsquo;s materials, and signed up to his daily mailing list (which means he&rsquo;s now a constant presence in my life).</p><p>As both Rochelle and Jonathan have been putting themselves out there for years, sharing quality content, and creating parasocial relationships with their target audience, they can charge premium fees for their expertise. At the time of this writing, Rochelle charges $1000 for a one-hour coaching session, while Jonathan charges $2500.</p><p>It&rsquo;s embarrassingly simple, but incredibly powerful and applicable in a wide range of areas. Or in <a href=https://jonathanstark.com/daily/20231117-1500-born-famous target=_blank rel=noopener>Jonathan&rsquo;s words</a>: <em>&ldquo;It takes time, but if you want to justify premium fees, becoming a celebrity in your space is a great way to grow your profits.&rdquo;</em></p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The power of parasocial relationships on x" href="https://x.com/intent/tweet/?text=The%20power%20of%20parasocial%20relationships&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f08%2fthe-power-of-parasocial-relationships%2f&amp;hashtags=business%2ccareer%2cmarketing"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The power of parasocial relationships on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f08%2fthe-power-of-parasocial-relationships%2f&amp;title=The%20power%20of%20parasocial%20relationships&amp;summary=The%20power%20of%20parasocial%20relationships&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f08%2fthe-power-of-parasocial-relationships%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The power of parasocial relationships on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f08%2fthe-power-of-parasocial-relationships%2f&title=The%20power%20of%20parasocial%20relationships"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The power of parasocial relationships on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f08%2fthe-power-of-parasocial-relationships%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The power of parasocial relationships on whatsapp" href="https://api.whatsapp.com/send?text=The%20power%20of%20parasocial%20relationships%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f08%2fthe-power-of-parasocial-relationships%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The power of parasocial relationships on telegram" href="https://telegram.me/share/url?text=The%20power%20of%20parasocial%20relationships&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f08%2fthe-power-of-parasocial-relationships%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The power of parasocial relationships on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20power%20of%20parasocial%20relationships&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f08%2fthe-power-of-parasocial-relationships%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/index.html b/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/index.html
index 825aff8c6..37e86bd95 100644
--- a/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/index.html
+++ b/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Psychographic specialisations may work for discipline generalists | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,career,environment,marketing,personal"><meta name=description content="When focusing on a market segment defined by personal beliefs, it&rsquo;s often fine to position yourself as a generalist in your craft."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Psychographic specialisations may work for discipline generalists"><meta property="og:description" content="When focusing on a market segment defined by personal beliefs, it&rsquo;s often fine to position yourself as a generalist in your craft."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-01-09T03:00:00+00:00"><meta property="article:modified_time" content="2024-01-09T13:23:28+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Psychographic specialisations may work for discipline generalists"><meta name=twitter:description content="When focusing on a market segment defined by personal beliefs, it&rsquo;s often fine to position yourself as a generalist in your craft."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Psychographic specialisations may work for discipline generalists","item":"https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Psychographic specialisations may work for discipline generalists","name":"Psychographic specialisations may work for discipline generalists","description":"When focusing on a market segment defined by personal beliefs, it\u0026rsquo;s often fine to position yourself as a generalist in your craft.","keywords":["business","career","environment","marketing","personal"],"articleBody":"The Business of Authority is a treasure trove of information for independent consultants. This morning, I listened to a 2019 episode titled Five Ways To Specialize, where hosts Jonathan Stark and Rochelle Moulton went deep into these approaches:1\nHorizontal Specialization: Niching Down on a skill that can be applied to a very broad range of client types. e.g., responsive web design, iOS development, MySQL administration. Platform Specialization: A subset of Horizontal Specialization that targets a tool or platform that your ideal buyer was involved in choosing. Shopify, WordPress, or Salesforce are examples of Platform Specializations because your buyer almost certainly was involved in choosing the technology in question. Tools like Photoshop, JavaScript, or MySQL are not Platform Specializations because the buyer probably didn’t directly or consciously choose them. Vertical Specialization: Niching Down on a market segment in which vendors offer goods and services specific to an industry, trade, profession, or other group of customers with specialized needs. Typical examples of buyers in a vertical market would be quick service restaurants, ski resorts, pet shelters, auto repair shops, and so on. Demographic Specialization: Niching Down on a market segment defined by personal attributes of an individual (e.g., Baby Boomers, New York residents, millionaires, Asian Americans, migraine sufferers). Psychographic Specialization: Niching Down on a market segment defined by personal beliefs, attitudes, or behaviors of an individual (e.g., environmentalists, skeptics, flat-earthers, dreamers). I discovered the episode because I wanted to understand psychographic specialisations better, as part of figuring out my current positioning. In my first round as an independent consultant (back in 2014-2015), I specialised horizontally as a data scientist. These days, data scientists aren’t that special, and I’m not that interested in further horizontal specialisation (e.g., as a machine learning engineer focused on edge applications). While horizontal specialisations are interesting from a technical perspective, my true interest is in nature-positive outcomes, i.e., cater for the psychographic market segment of “environmentalists”.2 This is consistent with career actions I’ve taken over the past decade, e.g., stopping my independent consulting / product building to join Car Next Door (now Uber Carshare) in 2016, founding a sustainability resource group as an Automattic employee in 2020, and leaving Automattic to focus more of my time on climate tech in 2021.\nRight now, my homepage says that “I provide independent consulting services around Data \u0026 AI, focusing on small-to-medium organisations in the climate tech and nature-positive sector”. However, my website tagline still says “Yanir Seroussi | Engineering Data Science \u0026 More” – changed last year from the long-standing “Data Science and Beyond”, but it’s still focused on my discipline rather than on the clients I aim to serve.3 Like many other independent consultants, I find it hard to commit to a niche that feels too narrow!\nBack to the podcast episode: To my relief, Stark noted that for those who choose psychographic specialisations, also specialising horizontally is less important. So going broad with Data \u0026 AI is both consistent with my experience (I’ve done work all over the stack), and makes sense when it comes to what I care about. For example, I don’t mind doing any web \u0026 data work for Reef Life Survey or engaging in full-stack data engineering for Work on Climate, as it delivers nature-positive outcomes in both cases. What I’m missing is higher profitability and a consistent client pipeline, but it seems like fully committing to the psychographic path is the right way to go.\nThe definitions were taken from Stark’s glossary, with slight tweaks. ↩︎\nIn my mind, the word “environmentalist” carries some negative connotations of aggressive and unrealistic activists, which is why I prefer the term “nature-positive” – focusing on the key outcome people in my niche want. ↩︎\nStark defines discipline as your craft, specialty, or job title. ↩︎\n","wordCount":"626","inLanguage":"en","datePublished":"2024-01-09T03:00:00Z","dateModified":"2024-01-09T13:23:28+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Psychographic specialisations may work for discipline generalists</h1><div class=post-meta><span title='2024-01-09 03:00:00 +0000 UTC'>January 9, 2024</span></div></header><div class=post-content><p><a href=https://thebusinessofauthority.com/ target=_blank rel=noopener>The Business of Authority</a> is a treasure trove of information for independent consultants. This morning, I listened to a 2019 episode titled <a href=https://thebusinessofauthority.com/episodes/five-ways-to-specialize target=_blank rel=noopener>Five Ways To Specialize</a>, where hosts Jonathan Stark and Rochelle Moulton went deep into these approaches:<sup id=fnref:1><a href=#fn:1 class=footnote-ref role=doc-noteref>1</a></sup></p><blockquote><ul><li><strong>Horizontal Specialization:</strong> Niching Down on a skill that can be applied to a very broad range of client types. e.g., responsive web design, iOS development, MySQL administration.</li><li><strong>Platform Specialization:</strong> A subset of Horizontal Specialization that targets a tool or platform that your ideal buyer was involved in choosing. Shopify, WordPress, or Salesforce are examples of Platform Specializations because your buyer almost certainly was involved in choosing the technology in question. Tools like Photoshop, JavaScript, or MySQL are not Platform Specializations because the buyer probably didn&rsquo;t directly or consciously choose them.</li><li><strong>Vertical Specialization:</strong> Niching Down on a market segment in which vendors offer goods and services specific to an industry, trade, profession, or other group of customers with specialized needs. Typical examples of buyers in a vertical market would be quick service restaurants, ski resorts, pet shelters, auto repair shops, and so on.</li><li><strong>Demographic Specialization:</strong> Niching Down on a market segment defined by personal attributes of an individual (e.g., Baby Boomers, New York residents, millionaires, Asian Americans, migraine sufferers).</li><li><strong>Psychographic Specialization:</strong> Niching Down on a market segment defined by personal beliefs, attitudes, or behaviors of an individual (e.g., environmentalists, skeptics, flat-earthers, dreamers).</li></ul></blockquote><p>I discovered the episode because I wanted to understand psychographic specialisations better, as part of figuring out my current positioning. In my first round as an independent consultant (back in 2014-2015), I specialised horizontally as a data scientist. These days, <a href=https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/>data scientists aren&rsquo;t that special</a>, and I&rsquo;m not <em>that</em> interested in further horizontal specialisation (e.g., as a machine learning engineer focused on edge applications). While horizontal specialisations are interesting from a technical perspective, my true interest is in nature-positive outcomes, i.e., cater for the psychographic market segment of &ldquo;environmentalists&rdquo;.<sup id=fnref:2><a href=#fn:2 class=footnote-ref role=doc-noteref>2</a></sup> This is consistent with career actions I&rsquo;ve taken over the past decade, e.g., <a href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/>stopping my independent consulting / product building to join Car Next Door (now Uber Carshare) in 2016</a>, <a href=https://wordpress.com/blog/2020/09/21/toward-zero-reducing-and-offsetting-our-data-center-power-emissions/ target=_blank rel=noopener>founding a sustainability resource group as an Automattic employee in 2020</a>, and <a href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/>leaving Automattic to focus more of my time on climate tech in 2021</a>.</p><p>Right now, my homepage says that <em>&ldquo;I provide independent consulting services around Data & AI, focusing on small-to-medium organisations in the climate tech and nature-positive sector&rdquo;</em>. However, my website tagline still says <em>&ldquo;Yanir Seroussi | Engineering Data Science & More&rdquo;</em> – changed last year from the long-standing <em>&ldquo;Data Science and Beyond&rdquo;</em>, but it&rsquo;s still focused on my discipline rather than on the clients I aim to serve.<sup id=fnref:3><a href=#fn:3 class=footnote-ref role=doc-noteref>3</a></sup> Like many other independent consultants, I find it hard to commit to a niche that feels too narrow!</p><p>Back to the podcast episode: To my relief, Stark noted that for those who choose psychographic specialisations, also specialising horizontally is less important. So going broad with Data & AI is both consistent with my experience (I&rsquo;ve done work all over the stack), and makes sense when it comes to what I care about. For example, I don&rsquo;t mind <a href=https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/>doing any web & data work for Reef Life Survey</a> or engaging in full-stack data engineering for Work on Climate, as it delivers nature-positive outcomes in both cases. What I&rsquo;m missing is higher profitability and a consistent client pipeline, but it seems like fully committing to the psychographic path is the right way to go.</p><div class=footnotes role=doc-endnotes><hr><ol><li id=fn:1><p>The definitions were taken from <a href=https://jonathanstark.com/glossary target=_blank rel=noopener>Stark&rsquo;s glossary</a>, with slight tweaks.&#160;<a href=#fnref:1 class=footnote-backref role=doc-backlink>&#8617;&#xfe0e;</a></p></li><li id=fn:2><p>In my mind, the word &ldquo;environmentalist&rdquo; carries some negative connotations of aggressive and unrealistic activists, which is why I prefer the term &ldquo;nature-positive&rdquo; – focusing on the key outcome people in my niche want.&#160;<a href=#fnref:2 class=footnote-backref role=doc-backlink>&#8617;&#xfe0e;</a></p></li><li id=fn:3><p><a href=https://jonathanstark.com/glossary#Discipline target=_blank rel=noopener>Stark defines discipline</a> as <em>your craft, specialty, or job title</em>.&#160;<a href=#fnref:3 class=footnote-backref role=doc-backlink>&#8617;&#xfe0e;</a></p></li></ol></div></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/environment/>Environment</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Psychographic specialisations may work for discipline generalists on x" href="https://x.com/intent/tweet/?text=Psychographic%20specialisations%20may%20work%20for%20discipline%20generalists&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f09%2fpsychographic-specialisations-may-work-for-discipline-generalists%2f&amp;hashtags=business%2ccareer%2cenvironment%2cmarketing%2cpersonal"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Psychographic specialisations may work for discipline generalists on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f09%2fpsychographic-specialisations-may-work-for-discipline-generalists%2f&amp;title=Psychographic%20specialisations%20may%20work%20for%20discipline%20generalists&amp;summary=Psychographic%20specialisations%20may%20work%20for%20discipline%20generalists&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f09%2fpsychographic-specialisations-may-work-for-discipline-generalists%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Psychographic specialisations may work for discipline generalists on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f09%2fpsychographic-specialisations-may-work-for-discipline-generalists%2f&title=Psychographic%20specialisations%20may%20work%20for%20discipline%20generalists"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Psychographic specialisations may work for discipline generalists on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f09%2fpsychographic-specialisations-may-work-for-discipline-generalists%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Psychographic specialisations may work for discipline generalists on whatsapp" href="https://api.whatsapp.com/send?text=Psychographic%20specialisations%20may%20work%20for%20discipline%20generalists%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f09%2fpsychographic-specialisations-may-work-for-discipline-generalists%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Psychographic specialisations may work for discipline generalists on telegram" href="https://telegram.me/share/url?text=Psychographic%20specialisations%20may%20work%20for%20discipline%20generalists&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f09%2fpsychographic-specialisations-may-work-for-discipline-generalists%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Psychographic specialisations may work for discipline generalists on ycombinator" href="https://news.ycombinator.com/submitlink?t=Psychographic%20specialisations%20may%20work%20for%20discipline%20generalists&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f09%2fpsychographic-specialisations-may-work-for-discipline-generalists%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="business,career,environment,marketing,personal"><meta name=description content="When focusing on a market segment defined by personal beliefs, it&rsquo;s often fine to position yourself as a generalist in your craft."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Psychographic specialisations may work for discipline generalists"><meta property="og:description" content="When focusing on a market segment defined by personal beliefs, it&rsquo;s often fine to position yourself as a generalist in your craft."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-01-09T03:00:00+00:00"><meta property="article:modified_time" content="2024-01-09T13:23:28+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Psychographic specialisations may work for discipline generalists"><meta name=twitter:description content="When focusing on a market segment defined by personal beliefs, it&rsquo;s often fine to position yourself as a generalist in your craft."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Psychographic specialisations may work for discipline generalists","item":"https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Psychographic specialisations may work for discipline generalists","name":"Psychographic specialisations may work for discipline generalists","description":"When focusing on a market segment defined by personal beliefs, it\u0026rsquo;s often fine to position yourself as a generalist in your craft.","keywords":["business","career","environment","marketing","personal"],"articleBody":"The Business of Authority is a treasure trove of information for independent consultants. This morning, I listened to a 2019 episode titled Five Ways To Specialize, where hosts Jonathan Stark and Rochelle Moulton went deep into these approaches:1\nHorizontal Specialization: Niching Down on a skill that can be applied to a very broad range of client types. e.g., responsive web design, iOS development, MySQL administration. Platform Specialization: A subset of Horizontal Specialization that targets a tool or platform that your ideal buyer was involved in choosing. Shopify, WordPress, or Salesforce are examples of Platform Specializations because your buyer almost certainly was involved in choosing the technology in question. Tools like Photoshop, JavaScript, or MySQL are not Platform Specializations because the buyer probably didn’t directly or consciously choose them. Vertical Specialization: Niching Down on a market segment in which vendors offer goods and services specific to an industry, trade, profession, or other group of customers with specialized needs. Typical examples of buyers in a vertical market would be quick service restaurants, ski resorts, pet shelters, auto repair shops, and so on. Demographic Specialization: Niching Down on a market segment defined by personal attributes of an individual (e.g., Baby Boomers, New York residents, millionaires, Asian Americans, migraine sufferers). Psychographic Specialization: Niching Down on a market segment defined by personal beliefs, attitudes, or behaviors of an individual (e.g., environmentalists, skeptics, flat-earthers, dreamers). I discovered the episode because I wanted to understand psychographic specialisations better, as part of figuring out my current positioning. In my first round as an independent consultant (back in 2014-2015), I specialised horizontally as a data scientist. These days, data scientists aren’t that special, and I’m not that interested in further horizontal specialisation (e.g., as a machine learning engineer focused on edge applications). While horizontal specialisations are interesting from a technical perspective, my true interest is in nature-positive outcomes, i.e., cater for the psychographic market segment of “environmentalists”.2 This is consistent with career actions I’ve taken over the past decade, e.g., stopping my independent consulting / product building to join Car Next Door (now Uber Carshare) in 2016, founding a sustainability resource group as an Automattic employee in 2020, and leaving Automattic to focus more of my time on climate tech in 2021.\nRight now, my homepage says that “I provide independent consulting services around Data \u0026 AI, focusing on small-to-medium organisations in the climate tech and nature-positive sector”. However, my website tagline still says “Yanir Seroussi | Engineering Data Science \u0026 More” – changed last year from the long-standing “Data Science and Beyond”, but it’s still focused on my discipline rather than on the clients I aim to serve.3 Like many other independent consultants, I find it hard to commit to a niche that feels too narrow!\nBack to the podcast episode: To my relief, Stark noted that for those who choose psychographic specialisations, also specialising horizontally is less important. So going broad with Data \u0026 AI is both consistent with my experience (I’ve done work all over the stack), and makes sense when it comes to what I care about. For example, I don’t mind doing any web \u0026 data work for Reef Life Survey or engaging in full-stack data engineering for Work on Climate, as it delivers nature-positive outcomes in both cases. What I’m missing is higher profitability and a consistent client pipeline, but it seems like fully committing to the psychographic path is the right way to go.\nThe definitions were taken from Stark’s glossary, with slight tweaks. ↩︎\nIn my mind, the word “environmentalist” carries some negative connotations of aggressive and unrealistic activists, which is why I prefer the term “nature-positive” – focusing on the key outcome people in my niche want. ↩︎\nStark defines discipline as your craft, specialty, or job title. ↩︎\n","wordCount":"626","inLanguage":"en","datePublished":"2024-01-09T03:00:00Z","dateModified":"2024-01-09T13:23:28+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Psychographic specialisations may work for discipline generalists</h1><div class=post-meta><span title='2024-01-09 03:00:00 +0000 UTC'>January 9, 2024</span></div></header><div class=post-content><p><a href=https://thebusinessofauthority.com/ target=_blank rel=noopener>The Business of Authority</a> is a treasure trove of information for independent consultants. This morning, I listened to a 2019 episode titled <a href=https://thebusinessofauthority.com/episodes/five-ways-to-specialize target=_blank rel=noopener>Five Ways To Specialize</a>, where hosts Jonathan Stark and Rochelle Moulton went deep into these approaches:<sup id=fnref:1><a href=#fn:1 class=footnote-ref role=doc-noteref>1</a></sup></p><blockquote><ul><li><strong>Horizontal Specialization:</strong> Niching Down on a skill that can be applied to a very broad range of client types. e.g., responsive web design, iOS development, MySQL administration.</li><li><strong>Platform Specialization:</strong> A subset of Horizontal Specialization that targets a tool or platform that your ideal buyer was involved in choosing. Shopify, WordPress, or Salesforce are examples of Platform Specializations because your buyer almost certainly was involved in choosing the technology in question. Tools like Photoshop, JavaScript, or MySQL are not Platform Specializations because the buyer probably didn&rsquo;t directly or consciously choose them.</li><li><strong>Vertical Specialization:</strong> Niching Down on a market segment in which vendors offer goods and services specific to an industry, trade, profession, or other group of customers with specialized needs. Typical examples of buyers in a vertical market would be quick service restaurants, ski resorts, pet shelters, auto repair shops, and so on.</li><li><strong>Demographic Specialization:</strong> Niching Down on a market segment defined by personal attributes of an individual (e.g., Baby Boomers, New York residents, millionaires, Asian Americans, migraine sufferers).</li><li><strong>Psychographic Specialization:</strong> Niching Down on a market segment defined by personal beliefs, attitudes, or behaviors of an individual (e.g., environmentalists, skeptics, flat-earthers, dreamers).</li></ul></blockquote><p>I discovered the episode because I wanted to understand psychographic specialisations better, as part of figuring out my current positioning. In my first round as an independent consultant (back in 2014-2015), I specialised horizontally as a data scientist. These days, <a href=https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/>data scientists aren&rsquo;t that special</a>, and I&rsquo;m not <em>that</em> interested in further horizontal specialisation (e.g., as a machine learning engineer focused on edge applications). While horizontal specialisations are interesting from a technical perspective, my true interest is in nature-positive outcomes, i.e., cater for the psychographic market segment of &ldquo;environmentalists&rdquo;.<sup id=fnref:2><a href=#fn:2 class=footnote-ref role=doc-noteref>2</a></sup> This is consistent with career actions I&rsquo;ve taken over the past decade, e.g., <a href=https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/>stopping my independent consulting / product building to join Car Next Door (now Uber Carshare) in 2016</a>, <a href=https://wordpress.com/blog/2020/09/21/toward-zero-reducing-and-offsetting-our-data-center-power-emissions/ target=_blank rel=noopener>founding a sustainability resource group as an Automattic employee in 2020</a>, and <a href=https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/>leaving Automattic to focus more of my time on climate tech in 2021</a>.</p><p>Right now, my homepage says that <em>&ldquo;I provide independent consulting services around Data & AI, focusing on small-to-medium organisations in the climate tech and nature-positive sector&rdquo;</em>. However, my website tagline still says <em>&ldquo;Yanir Seroussi | Engineering Data Science & More&rdquo;</em> – changed last year from the long-standing <em>&ldquo;Data Science and Beyond&rdquo;</em>, but it&rsquo;s still focused on my discipline rather than on the clients I aim to serve.<sup id=fnref:3><a href=#fn:3 class=footnote-ref role=doc-noteref>3</a></sup> Like many other independent consultants, I find it hard to commit to a niche that feels too narrow!</p><p>Back to the podcast episode: To my relief, Stark noted that for those who choose psychographic specialisations, also specialising horizontally is less important. So going broad with Data & AI is both consistent with my experience (I&rsquo;ve done work all over the stack), and makes sense when it comes to what I care about. For example, I don&rsquo;t mind <a href=https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/>doing any web & data work for Reef Life Survey</a> or engaging in full-stack data engineering for Work on Climate, as it delivers nature-positive outcomes in both cases. What I&rsquo;m missing is higher profitability and a consistent client pipeline, but it seems like fully committing to the psychographic path is the right way to go.</p><div class=footnotes role=doc-endnotes><hr><ol><li id=fn:1><p>The definitions were taken from <a href=https://jonathanstark.com/glossary target=_blank rel=noopener>Stark&rsquo;s glossary</a>, with slight tweaks.&#160;<a href=#fnref:1 class=footnote-backref role=doc-backlink>&#8617;&#xfe0e;</a></p></li><li id=fn:2><p>In my mind, the word &ldquo;environmentalist&rdquo; carries some negative connotations of aggressive and unrealistic activists, which is why I prefer the term &ldquo;nature-positive&rdquo; – focusing on the key outcome people in my niche want.&#160;<a href=#fnref:2 class=footnote-backref role=doc-backlink>&#8617;&#xfe0e;</a></p></li><li id=fn:3><p><a href=https://jonathanstark.com/glossary#Discipline target=_blank rel=noopener>Stark defines discipline</a> as <em>your craft, specialty, or job title</em>.&#160;<a href=#fnref:3 class=footnote-backref role=doc-backlink>&#8617;&#xfe0e;</a></p></li></ol></div></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/environment/>Environment</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Psychographic specialisations may work for discipline generalists on x" href="https://x.com/intent/tweet/?text=Psychographic%20specialisations%20may%20work%20for%20discipline%20generalists&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f09%2fpsychographic-specialisations-may-work-for-discipline-generalists%2f&amp;hashtags=business%2ccareer%2cenvironment%2cmarketing%2cpersonal"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Psychographic specialisations may work for discipline generalists on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f09%2fpsychographic-specialisations-may-work-for-discipline-generalists%2f&amp;title=Psychographic%20specialisations%20may%20work%20for%20discipline%20generalists&amp;summary=Psychographic%20specialisations%20may%20work%20for%20discipline%20generalists&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f09%2fpsychographic-specialisations-may-work-for-discipline-generalists%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Psychographic specialisations may work for discipline generalists on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f09%2fpsychographic-specialisations-may-work-for-discipline-generalists%2f&title=Psychographic%20specialisations%20may%20work%20for%20discipline%20generalists"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Psychographic specialisations may work for discipline generalists on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f09%2fpsychographic-specialisations-may-work-for-discipline-generalists%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Psychographic specialisations may work for discipline generalists on whatsapp" href="https://api.whatsapp.com/send?text=Psychographic%20specialisations%20may%20work%20for%20discipline%20generalists%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f09%2fpsychographic-specialisations-may-work-for-discipline-generalists%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Psychographic specialisations may work for discipline generalists on telegram" href="https://telegram.me/share/url?text=Psychographic%20specialisations%20may%20work%20for%20discipline%20generalists&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f09%2fpsychographic-specialisations-may-work-for-discipline-generalists%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Psychographic specialisations may work for discipline generalists on ycombinator" href="https://news.ycombinator.com/submitlink?t=Psychographic%20specialisations%20may%20work%20for%20discipline%20generalists&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f01%2f09%2fpsychographic-specialisations-may-work-for-discipline-generalists%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2024/02/06/future-software-development-may-require-fewer-humans/index.html b/til/2024/02/06/future-software-development-may-require-fewer-humans/index.html
index add393aa1..89412f44e 100644
--- a/til/2024/02/06/future-software-development-may-require-fewer-humans/index.html
+++ b/til/2024/02/06/future-software-development-may-require-fewer-humans/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Future software development may require fewer humans | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,quotes,software engineering"><meta name=description content="Reflecting on an interview with Jason Warner, CEO of poolside."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Future software development may require fewer humans"><meta property="og:description" content="Reflecting on an interview with Jason Warner, CEO of poolside."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-02-06T06:15:00+00:00"><meta property="article:modified_time" content="2024-02-06T16:39:35+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Future software development may require fewer humans"><meta name=twitter:description content="Reflecting on an interview with Jason Warner, CEO of poolside."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Future software development may require fewer humans","item":"https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Future software development may require fewer humans","name":"Future software development may require fewer humans","description":"Reflecting on an interview with Jason Warner, CEO of poolside.","keywords":["artificial intelligence","quotes","software engineering"],"articleBody":"I listened to an interesting interview with Jason Warner, CEO of poolside, which works on “building the world’s most capable AI for software development”. Key quotes:\nSo imagine, if you will, that GPT-4, which is, as far as I’m concerned, the gold standard. It is by far the best in almost every way possible at what it does. But let’s just call GPT-4, for the moment, the Toyota Camry. It is a vehicle. It is the bestselling sedan in the world, and is a general purpose vehicle. It can take you to work, go on vacation, haul your family around, go get groceries. But imagine all of a sudden, because it’s the only vehicle in the world at the moment, you start abusing it for things that it really wasn’t built for. […] Well, we’re introducing a new vehicle type. So still a large language model with applications that are built on top of it, but it’s a new vehicle type, and it’s the Ford F-150. And it is specifically built for those environments and those orientations and those jobs.\nFrom a model perspective, there’s a massive, massive difference between us and others in that, one, something that’s tuned for software will know more about software. […] OpenAI has made famous reinforcement learning via human feedback. Anthropic has made famous reinforcement learning via constitutional AI or algorithmic AI and things of that nature. We’re introducing something that we call reinforcement learning via code execution feedback. So taking advantage of the aspects of software and the aspects of code that you might imagine. One, it’s inspectable, two, it’s runnable, and three, you have to compile or execute these things and you can get deterministic feedback. And so what we have done is inside of our training set, we’ve made very, very, very, very different decisions than general purpose models would make.\nAnd this goes to the heart of why a truck is different than a sedan, we’ve made very different design decisions. We have included only high-quality code in the model. We have cut data sources out of the initial dataset because it’s a very different audience that’s going to use this for a very different purpose. And so this goes to the reinforcement learning side of the fence, too. We care very deeply that yeah, we’ll have human feedback as well, but the reinforcement learning is going to be all about what’s produced from the software side. So we’ve taken about 50,000 high quality real-world projects out of the initial training dataset and are using it on the reinforcement side. We’ve made all the git commits executable. We have all three legs of the stool that we need. We’ve got the issue that’s described in real language. We’ve got the code, we’ve got tests, we’ve got all of the things that we need, and we send this through a reinforcement learning platform on our model to see what it does.\nI find the idea of reinforcement learning from code execution feedback very compelling. Given how well GPT-4 already performs, it makes sense that you could get much better results by following poolside’s approach.\nAs with other AI advancements, the implications on society are likely to be profound. It’s not hard to imagine a situation where software development becomes much less labour-intensive than it is today, leading to unemployment among software developers. Or it could be that access to better AI software developers would lead to much more software getting built. Who knows?\nMy bet is on the future of software development being quite different from today, and on things changing more rapidly than most people realise. Skills that are valuable today may be obsolete within 5-10 years. Rephrasing poolside’s website, we’re likely to switch from human-led-AI-assisted development to AI-led-human-assisted development.\n","wordCount":"630","inLanguage":"en","datePublished":"2024-02-06T06:15:00Z","dateModified":"2024-02-06T16:39:35+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Future software development may require fewer humans</h1><div class=post-meta><span title='2024-02-06 06:15:00 +0000 UTC'>February 6, 2024</span></div></header><div class=post-content><p>I listened to <a href=https://www.superdatascience.com/podcast/a-code-specialized-llm-will-realize-agi-with-jason-warner target=_blank rel=noopener>an interesting interview with Jason Warner</a>, CEO of <a href=https://www.poolside.ai/ target=_blank rel=noopener>poolside</a>, which works on <em>&ldquo;building the world&rsquo;s most capable AI for software development&rdquo;</em>. Key quotes:</p><blockquote><p>So imagine, if you will, that GPT-4, which is, as far as I&rsquo;m concerned, the gold standard. It is by far the best in almost every way possible at what it does. But let&rsquo;s just call GPT-4, for the moment, the Toyota Camry. It is a vehicle. It is the bestselling sedan in the world, and is a general purpose vehicle. It can take you to work, go on vacation, haul your family around, go get groceries. But imagine all of a sudden, because it&rsquo;s the only vehicle in the world at the moment, you start abusing it for things that it really wasn&rsquo;t built for.
+<meta name=keywords content="artificial intelligence,quotes,software engineering"><meta name=description content="Reflecting on an interview with Jason Warner, CEO of poolside."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Future software development may require fewer humans"><meta property="og:description" content="Reflecting on an interview with Jason Warner, CEO of poolside."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-02-06T06:15:00+00:00"><meta property="article:modified_time" content="2024-02-06T16:39:35+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Future software development may require fewer humans"><meta name=twitter:description content="Reflecting on an interview with Jason Warner, CEO of poolside."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Future software development may require fewer humans","item":"https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Future software development may require fewer humans","name":"Future software development may require fewer humans","description":"Reflecting on an interview with Jason Warner, CEO of poolside.","keywords":["artificial intelligence","quotes","software engineering"],"articleBody":"I listened to an interesting interview with Jason Warner, CEO of poolside, which works on “building the world’s most capable AI for software development”. Key quotes:\nSo imagine, if you will, that GPT-4, which is, as far as I’m concerned, the gold standard. It is by far the best in almost every way possible at what it does. But let’s just call GPT-4, for the moment, the Toyota Camry. It is a vehicle. It is the bestselling sedan in the world, and is a general purpose vehicle. It can take you to work, go on vacation, haul your family around, go get groceries. But imagine all of a sudden, because it’s the only vehicle in the world at the moment, you start abusing it for things that it really wasn’t built for. […] Well, we’re introducing a new vehicle type. So still a large language model with applications that are built on top of it, but it’s a new vehicle type, and it’s the Ford F-150. And it is specifically built for those environments and those orientations and those jobs.\nFrom a model perspective, there’s a massive, massive difference between us and others in that, one, something that’s tuned for software will know more about software. […] OpenAI has made famous reinforcement learning via human feedback. Anthropic has made famous reinforcement learning via constitutional AI or algorithmic AI and things of that nature. We’re introducing something that we call reinforcement learning via code execution feedback. So taking advantage of the aspects of software and the aspects of code that you might imagine. One, it’s inspectable, two, it’s runnable, and three, you have to compile or execute these things and you can get deterministic feedback. And so what we have done is inside of our training set, we’ve made very, very, very, very different decisions than general purpose models would make.\nAnd this goes to the heart of why a truck is different than a sedan, we’ve made very different design decisions. We have included only high-quality code in the model. We have cut data sources out of the initial dataset because it’s a very different audience that’s going to use this for a very different purpose. And so this goes to the reinforcement learning side of the fence, too. We care very deeply that yeah, we’ll have human feedback as well, but the reinforcement learning is going to be all about what’s produced from the software side. So we’ve taken about 50,000 high quality real-world projects out of the initial training dataset and are using it on the reinforcement side. We’ve made all the git commits executable. We have all three legs of the stool that we need. We’ve got the issue that’s described in real language. We’ve got the code, we’ve got tests, we’ve got all of the things that we need, and we send this through a reinforcement learning platform on our model to see what it does.\nI find the idea of reinforcement learning from code execution feedback very compelling. Given how well GPT-4 already performs, it makes sense that you could get much better results by following poolside’s approach.\nAs with other AI advancements, the implications on society are likely to be profound. It’s not hard to imagine a situation where software development becomes much less labour-intensive than it is today, leading to unemployment among software developers. Or it could be that access to better AI software developers would lead to much more software getting built. Who knows?\nMy bet is on the future of software development being quite different from today, and on things changing more rapidly than most people realise. Skills that are valuable today may be obsolete within 5-10 years. Rephrasing poolside’s website, we’re likely to switch from human-led-AI-assisted development to AI-led-human-assisted development.\n","wordCount":"630","inLanguage":"en","datePublished":"2024-02-06T06:15:00Z","dateModified":"2024-02-06T16:39:35+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Future software development may require fewer humans</h1><div class=post-meta><span title='2024-02-06 06:15:00 +0000 UTC'>February 6, 2024</span></div></header><div class=post-content><p>I listened to <a href=https://www.superdatascience.com/podcast/a-code-specialized-llm-will-realize-agi-with-jason-warner target=_blank rel=noopener>an interesting interview with Jason Warner</a>, CEO of <a href=https://www.poolside.ai/ target=_blank rel=noopener>poolside</a>, which works on <em>&ldquo;building the world&rsquo;s most capable AI for software development&rdquo;</em>. Key quotes:</p><blockquote><p>So imagine, if you will, that GPT-4, which is, as far as I&rsquo;m concerned, the gold standard. It is by far the best in almost every way possible at what it does. But let&rsquo;s just call GPT-4, for the moment, the Toyota Camry. It is a vehicle. It is the bestselling sedan in the world, and is a general purpose vehicle. It can take you to work, go on vacation, haul your family around, go get groceries. But imagine all of a sudden, because it&rsquo;s the only vehicle in the world at the moment, you start abusing it for things that it really wasn&rsquo;t built for.
 [&mldr;]
 Well, we&rsquo;re introducing a new vehicle type. So still a large language model with applications that are built on top of it, but it&rsquo;s a new vehicle type, and it&rsquo;s the Ford F-150. And it is specifically built for those environments and those orientations and those jobs.</p></blockquote><blockquote><p>From a model perspective, there&rsquo;s a massive, massive difference between us and others in that, one, something that&rsquo;s tuned for software will know more about software. [&mldr;] OpenAI has made famous reinforcement learning via human feedback. Anthropic has made famous reinforcement learning via constitutional AI or algorithmic AI and things of that nature. We&rsquo;re introducing something that we call reinforcement learning via code execution feedback. So taking advantage of the aspects of software and the aspects of code that you might imagine. One, it&rsquo;s inspectable, two, it&rsquo;s runnable, and three, you have to compile or execute these things and you can get deterministic feedback. And so what we have done is inside of our training set, we&rsquo;ve made very, very, very, very different decisions than general purpose models would make.</p><p>And this goes to the heart of why a truck is different than a sedan, we&rsquo;ve made very different design decisions. We have included only high-quality code in the model. We have cut data sources out of the initial dataset because it&rsquo;s a very different audience that&rsquo;s going to use this for a very different purpose. And so this goes to the reinforcement learning side of the fence, too. We care very deeply that yeah, we&rsquo;ll have human feedback as well, but the reinforcement learning is going to be all about what&rsquo;s produced from the software side. So we&rsquo;ve taken about 50,000 high quality real-world projects out of the initial training dataset and are using it on the reinforcement side. We&rsquo;ve made all the git commits executable. We have all three legs of the stool that we need. We&rsquo;ve got the issue that&rsquo;s described in real language. We&rsquo;ve got the code, we&rsquo;ve got tests, we&rsquo;ve got all of the things that we need, and we send this through a reinforcement learning platform on our model to see what it does.</p></blockquote><p>I find the idea of reinforcement learning from code execution feedback very compelling. Given how well GPT-4 already performs, it makes sense that you could get much better results by following poolside&rsquo;s approach.</p><p>As with other AI advancements, the implications on society are likely to be profound. It&rsquo;s not hard to imagine a situation where software development becomes much less labour-intensive than it is today, leading to unemployment among software developers. Or it could be that access to better AI software developers would lead to much more software getting built. Who knows?</p><p>My bet is on the future of software development being quite different from today, and on things changing more rapidly than most people realise. Skills that are valuable today may be obsolete within 5-10 years. Rephrasing poolside&rsquo;s website, we&rsquo;re likely to switch from human-led-AI-assisted development to AI-led-human-assisted development.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Future software development may require fewer humans on x" href="https://x.com/intent/tweet/?text=Future%20software%20development%20may%20require%20fewer%20humans&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f06%2ffuture-software-development-may-require-fewer-humans%2f&amp;hashtags=artificialintelligence%2cquotes%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Future software development may require fewer humans on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f06%2ffuture-software-development-may-require-fewer-humans%2f&amp;title=Future%20software%20development%20may%20require%20fewer%20humans&amp;summary=Future%20software%20development%20may%20require%20fewer%20humans&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f06%2ffuture-software-development-may-require-fewer-humans%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Future software development may require fewer humans on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f06%2ffuture-software-development-may-require-fewer-humans%2f&title=Future%20software%20development%20may%20require%20fewer%20humans"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Future software development may require fewer humans on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f06%2ffuture-software-development-may-require-fewer-humans%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Future software development may require fewer humans on whatsapp" href="https://api.whatsapp.com/send?text=Future%20software%20development%20may%20require%20fewer%20humans%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f06%2ffuture-software-development-may-require-fewer-humans%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Future software development may require fewer humans on telegram" href="https://telegram.me/share/url?text=Future%20software%20development%20may%20require%20fewer%20humans&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f06%2ffuture-software-development-may-require-fewer-humans%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Future software development may require fewer humans on ycombinator" href="https://news.ycombinator.com/submitlink?t=Future%20software%20development%20may%20require%20fewer%20humans&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f06%2ffuture-software-development-may-require-fewer-humans%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
diff --git a/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/index.html b/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/index.html
index 228b26efb..9dffd1f2c 100644
--- a/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/index.html
+++ b/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>The three Cs of indie consulting: Confidence, Cash, and Connections | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,career,personal"><meta name=description content="Jonathan Stark makes a compelling argument why you should have the three Cs before quitting your job to go solo consulting."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The three Cs of indie consulting: Confidence, Cash, and Connections"><meta property="og:description" content="Jonathan Stark makes a compelling argument why you should have the three Cs before quitting your job to go solo consulting."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-02-17T02:00:00+00:00"><meta property="article:modified_time" content="2024-02-17T12:34:00+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="The three Cs of indie consulting: Confidence, Cash, and Connections"><meta name=twitter:description content="Jonathan Stark makes a compelling argument why you should have the three Cs before quitting your job to go solo consulting."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"The three Cs of indie consulting: Confidence, Cash, and Connections","item":"https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The three Cs of indie consulting: Confidence, Cash, and Connections","name":"The three Cs of indie consulting: Confidence, Cash, and Connections","description":"Jonathan Stark makes a compelling argument why you should have the three Cs before quitting your job to go solo consulting.","keywords":["business","career","personal"],"articleBody":"Since realising that the lines between solo consulting and product building are blurry, I’ve been consuming a lot of Jonathan Stark’s materials on positioning, value pricing, and related topics. I mostly read and listen to his content, but this short video hits the nail on the head:\nThe quick summary is that you should have the following before you quit your job to become a solo consultant:\nConfidence that you can deliver immediate value to your clients. Cash to get you through the tough times. Connections to ideal buyers in your target market. When I quit my last job, my plan was to take some time off, and then look at building a solo product business. It took me a while to consider consulting as a serious pursuit – I was definitely not thinking of things like ideal buyers in my target market, let alone how to connect to them and maintain relationships over time. I do have the confidence, and I’m not strapped for cash, but getting the positioning right and building connections is a long-term play. Anyone who’s quitting with the intention of consulting a soloist should definitely keep the three Cs in mind!\n","wordCount":"196","inLanguage":"en","datePublished":"2024-02-17T02:00:00Z","dateModified":"2024-02-17T12:34:00+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">The three Cs of indie consulting: Confidence, Cash, and Connections</h1><div class=post-meta><span title='2024-02-17 02:00:00 +0000 UTC'>February 17, 2024</span></div></header><div class=post-content><p>Since realising that <a href=https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/>the lines between solo consulting and product building are blurry</a>, I&rsquo;ve been consuming a lot of <a href=https://jonathanstark.com/ target=_blank rel=noopener>Jonathan Stark&rsquo;s</a> materials on positioning, value pricing, and related topics. I mostly read and listen to his content, but this short video hits the nail on the head:</p><p><div style=position:relative;padding-bottom:56.25%;height:0;overflow:hidden><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen loading=eager referrerpolicy=strict-origin-when-cross-origin src="https://www.youtube.com/embed/8pup0vBD0zI?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style=position:absolute;top:0;left:0;width:100%;height:100%;border:0 title="YouTube video"></iframe></div></p><p>The quick summary is that you should have the following before you quit your job to become a solo consultant:</p><ol><li><strong>Confidence</strong> that you can deliver immediate value to your clients.</li><li><strong>Cash</strong> to get you through the tough times.</li><li><strong>Connections</strong> to ideal buyers in your target market.</li></ol><p>When I quit my last job, my plan was to take some time off, and then look at building a solo product business. It took me a while to consider consulting as a serious pursuit – I was definitely not thinking of things like <em>ideal buyers in my target market</em>, let alone how to connect to them and maintain relationships over time. I do have the confidence, and I&rsquo;m not strapped for cash, but getting the positioning right and building connections is a long-term play. Anyone who&rsquo;s quitting with the intention of consulting a soloist should definitely keep the three Cs in mind!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The three Cs of indie consulting: Confidence, Cash, and Connections on x" href="https://x.com/intent/tweet/?text=The%20three%20Cs%20of%20indie%20consulting%3a%20Confidence%2c%20Cash%2c%20and%20Connections&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f17%2fthe-three-cs-of-indie-consulting-confidence-cash-and-connections%2f&amp;hashtags=business%2ccareer%2cpersonal"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The three Cs of indie consulting: Confidence, Cash, and Connections on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f17%2fthe-three-cs-of-indie-consulting-confidence-cash-and-connections%2f&amp;title=The%20three%20Cs%20of%20indie%20consulting%3a%20Confidence%2c%20Cash%2c%20and%20Connections&amp;summary=The%20three%20Cs%20of%20indie%20consulting%3a%20Confidence%2c%20Cash%2c%20and%20Connections&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f17%2fthe-three-cs-of-indie-consulting-confidence-cash-and-connections%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The three Cs of indie consulting: Confidence, Cash, and Connections on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f17%2fthe-three-cs-of-indie-consulting-confidence-cash-and-connections%2f&title=The%20three%20Cs%20of%20indie%20consulting%3a%20Confidence%2c%20Cash%2c%20and%20Connections"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The three Cs of indie consulting: Confidence, Cash, and Connections on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f17%2fthe-three-cs-of-indie-consulting-confidence-cash-and-connections%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The three Cs of indie consulting: Confidence, Cash, and Connections on whatsapp" href="https://api.whatsapp.com/send?text=The%20three%20Cs%20of%20indie%20consulting%3a%20Confidence%2c%20Cash%2c%20and%20Connections%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f17%2fthe-three-cs-of-indie-consulting-confidence-cash-and-connections%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The three Cs of indie consulting: Confidence, Cash, and Connections on telegram" href="https://telegram.me/share/url?text=The%20three%20Cs%20of%20indie%20consulting%3a%20Confidence%2c%20Cash%2c%20and%20Connections&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f17%2fthe-three-cs-of-indie-consulting-confidence-cash-and-connections%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The three Cs of indie consulting: Confidence, Cash, and Connections on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20three%20Cs%20of%20indie%20consulting%3a%20Confidence%2c%20Cash%2c%20and%20Connections&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f17%2fthe-three-cs-of-indie-consulting-confidence-cash-and-connections%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="business,career,personal"><meta name=description content="Jonathan Stark makes a compelling argument why you should have the three Cs before quitting your job to go solo consulting."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The three Cs of indie consulting: Confidence, Cash, and Connections"><meta property="og:description" content="Jonathan Stark makes a compelling argument why you should have the three Cs before quitting your job to go solo consulting."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-02-17T02:00:00+00:00"><meta property="article:modified_time" content="2024-02-17T12:34:00+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="The three Cs of indie consulting: Confidence, Cash, and Connections"><meta name=twitter:description content="Jonathan Stark makes a compelling argument why you should have the three Cs before quitting your job to go solo consulting."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"The three Cs of indie consulting: Confidence, Cash, and Connections","item":"https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The three Cs of indie consulting: Confidence, Cash, and Connections","name":"The three Cs of indie consulting: Confidence, Cash, and Connections","description":"Jonathan Stark makes a compelling argument why you should have the three Cs before quitting your job to go solo consulting.","keywords":["business","career","personal"],"articleBody":"Since realising that the lines between solo consulting and product building are blurry, I’ve been consuming a lot of Jonathan Stark’s materials on positioning, value pricing, and related topics. I mostly read and listen to his content, but this short video hits the nail on the head:\nThe quick summary is that you should have the following before you quit your job to become a solo consultant:\nConfidence that you can deliver immediate value to your clients. Cash to get you through the tough times. Connections to ideal buyers in your target market. When I quit my last job, my plan was to take some time off, and then look at building a solo product business. It took me a while to consider consulting as a serious pursuit – I was definitely not thinking of things like ideal buyers in my target market, let alone how to connect to them and maintain relationships over time. I do have the confidence, and I’m not strapped for cash, but getting the positioning right and building connections is a long-term play. Anyone who’s quitting with the intention of consulting a soloist should definitely keep the three Cs in mind!\n","wordCount":"196","inLanguage":"en","datePublished":"2024-02-17T02:00:00Z","dateModified":"2024-02-17T12:34:00+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">The three Cs of indie consulting: Confidence, Cash, and Connections</h1><div class=post-meta><span title='2024-02-17 02:00:00 +0000 UTC'>February 17, 2024</span></div></header><div class=post-content><p>Since realising that <a href=https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/>the lines between solo consulting and product building are blurry</a>, I&rsquo;ve been consuming a lot of <a href=https://jonathanstark.com/ target=_blank rel=noopener>Jonathan Stark&rsquo;s</a> materials on positioning, value pricing, and related topics. I mostly read and listen to his content, but this short video hits the nail on the head:</p><p><div style=position:relative;padding-bottom:56.25%;height:0;overflow:hidden><iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen loading=eager referrerpolicy=strict-origin-when-cross-origin src="https://www.youtube.com/embed/8pup0vBD0zI?autoplay=0&controls=1&end=0&loop=0&mute=0&start=0" style=position:absolute;top:0;left:0;width:100%;height:100%;border:0 title="YouTube video"></iframe></div></p><p>The quick summary is that you should have the following before you quit your job to become a solo consultant:</p><ol><li><strong>Confidence</strong> that you can deliver immediate value to your clients.</li><li><strong>Cash</strong> to get you through the tough times.</li><li><strong>Connections</strong> to ideal buyers in your target market.</li></ol><p>When I quit my last job, my plan was to take some time off, and then look at building a solo product business. It took me a while to consider consulting as a serious pursuit – I was definitely not thinking of things like <em>ideal buyers in my target market</em>, let alone how to connect to them and maintain relationships over time. I do have the confidence, and I&rsquo;m not strapped for cash, but getting the positioning right and building connections is a long-term play. Anyone who&rsquo;s quitting with the intention of consulting a soloist should definitely keep the three Cs in mind!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The three Cs of indie consulting: Confidence, Cash, and Connections on x" href="https://x.com/intent/tweet/?text=The%20three%20Cs%20of%20indie%20consulting%3a%20Confidence%2c%20Cash%2c%20and%20Connections&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f17%2fthe-three-cs-of-indie-consulting-confidence-cash-and-connections%2f&amp;hashtags=business%2ccareer%2cpersonal"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The three Cs of indie consulting: Confidence, Cash, and Connections on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f17%2fthe-three-cs-of-indie-consulting-confidence-cash-and-connections%2f&amp;title=The%20three%20Cs%20of%20indie%20consulting%3a%20Confidence%2c%20Cash%2c%20and%20Connections&amp;summary=The%20three%20Cs%20of%20indie%20consulting%3a%20Confidence%2c%20Cash%2c%20and%20Connections&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f17%2fthe-three-cs-of-indie-consulting-confidence-cash-and-connections%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The three Cs of indie consulting: Confidence, Cash, and Connections on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f17%2fthe-three-cs-of-indie-consulting-confidence-cash-and-connections%2f&title=The%20three%20Cs%20of%20indie%20consulting%3a%20Confidence%2c%20Cash%2c%20and%20Connections"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The three Cs of indie consulting: Confidence, Cash, and Connections on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f17%2fthe-three-cs-of-indie-consulting-confidence-cash-and-connections%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The three Cs of indie consulting: Confidence, Cash, and Connections on whatsapp" href="https://api.whatsapp.com/send?text=The%20three%20Cs%20of%20indie%20consulting%3a%20Confidence%2c%20Cash%2c%20and%20Connections%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f17%2fthe-three-cs-of-indie-consulting-confidence-cash-and-connections%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The three Cs of indie consulting: Confidence, Cash, and Connections on telegram" href="https://telegram.me/share/url?text=The%20three%20Cs%20of%20indie%20consulting%3a%20Confidence%2c%20Cash%2c%20and%20Connections&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f17%2fthe-three-cs-of-indie-consulting-confidence-cash-and-connections%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The three Cs of indie consulting: Confidence, Cash, and Connections on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20three%20Cs%20of%20indie%20consulting%3a%20Confidence%2c%20Cash%2c%20and%20Connections&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f02%2f17%2fthe-three-cs-of-indie-consulting-confidence-cash-and-connections%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/index.html b/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/index.html
index f924379da..92190fe8e 100644
--- a/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/index.html
+++ b/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Atomic Habits is full of actionable advice | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="books,career,personal,productivity"><meta name=description content="I put the book to use after the first listen, and will definitely revisit it in the future to form better habits."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Atomic Habits is full of actionable advice"><meta property="og:description" content="I put the book to use after the first listen, and will definitely revisit it in the future to form better habits."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-03-12T06:19:31+00:00"><meta property="article:modified_time" content="2024-03-12T16:33:48+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Atomic Habits is full of actionable advice"><meta name=twitter:description content="I put the book to use after the first listen, and will definitely revisit it in the future to form better habits."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Atomic Habits is full of actionable advice","item":"https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Atomic Habits is full of actionable advice","name":"Atomic Habits is full of actionable advice","description":"I put the book to use after the first listen, and will definitely revisit it in the future to form better habits.","keywords":["books","career","personal","productivity"],"articleBody":"I recently picked up the Audible version of Atomic Habits, mostly because I kept seeing it everywhere. I wasn’t disappointed. It immediately gave me ideas for improving my routines based on a cursory listen. I will definitely revisit it and refer to resources like the habit cheat sheet in the future.\nAs a test, I used several tips to form better habits around sending messages (which is often a cause for procrastination):\nHabit stacking: Send one message after my morning workout (chained to an existing positive habit). Temptation bundling and reinforcement: After the message is out, have a snack. Tracking: Tick a box once the message is sent, which adds up to a lovely streak. …and it worked! It feels a bit odd consciously training myself like this, but it’s better than the alternative of sticking to bad habits and procrastination.\n","wordCount":"141","inLanguage":"en","datePublished":"2024-03-12T06:19:31Z","dateModified":"2024-03-12T16:33:48+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Atomic Habits is full of actionable advice</h1><div class=post-meta><span title='2024-03-12 06:19:31 +0000 UTC'>March 12, 2024</span></div></header><div class=post-content><p>I recently picked up the Audible version of <a href=https://jamesclear.com/atomic-habits target=_blank rel=noopener>Atomic Habits</a>, mostly because I kept seeing it everywhere. I wasn&rsquo;t disappointed. It immediately gave me ideas for improving my routines based on a cursory listen. I will definitely revisit it and refer to resources like <a href=https://s3.amazonaws.com/jamesclear/Atomic+Habits/Habits+Cheat+Sheet.pdf target=_blank rel=noopener>the habit cheat sheet</a> in the future.</p><p>As a test, I used several tips to form better habits around sending messages (which is often a cause for procrastination):</p><ul><li>Habit stacking: Send one message after my morning workout (chained to an existing positive habit).</li><li>Temptation bundling and reinforcement: After the message is out, have a snack.</li><li>Tracking: Tick a box once the message is sent, which adds up to a lovely streak.</li></ul><p>&mldr;and it worked! It feels a bit odd consciously training myself like this, but it&rsquo;s better than the alternative of sticking to bad habits and procrastination.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li><li><a href=https://yanirseroussi.com/tags/productivity/>Productivity</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Atomic Habits is full of actionable advice on x" href="https://x.com/intent/tweet/?text=Atomic%20Habits%20is%20full%20of%20actionable%20advice&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f03%2f12%2fatomic-habits-is-full-of-actionable-advice%2f&amp;hashtags=books%2ccareer%2cpersonal%2cproductivity"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Atomic Habits is full of actionable advice on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f03%2f12%2fatomic-habits-is-full-of-actionable-advice%2f&amp;title=Atomic%20Habits%20is%20full%20of%20actionable%20advice&amp;summary=Atomic%20Habits%20is%20full%20of%20actionable%20advice&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f03%2f12%2fatomic-habits-is-full-of-actionable-advice%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Atomic Habits is full of actionable advice on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f03%2f12%2fatomic-habits-is-full-of-actionable-advice%2f&title=Atomic%20Habits%20is%20full%20of%20actionable%20advice"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Atomic Habits is full of actionable advice on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f03%2f12%2fatomic-habits-is-full-of-actionable-advice%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Atomic Habits is full of actionable advice on whatsapp" href="https://api.whatsapp.com/send?text=Atomic%20Habits%20is%20full%20of%20actionable%20advice%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f03%2f12%2fatomic-habits-is-full-of-actionable-advice%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Atomic Habits is full of actionable advice on telegram" href="https://telegram.me/share/url?text=Atomic%20Habits%20is%20full%20of%20actionable%20advice&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f03%2f12%2fatomic-habits-is-full-of-actionable-advice%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Atomic Habits is full of actionable advice on ycombinator" href="https://news.ycombinator.com/submitlink?t=Atomic%20Habits%20is%20full%20of%20actionable%20advice&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f03%2f12%2fatomic-habits-is-full-of-actionable-advice%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="books,career,personal,productivity"><meta name=description content="I put the book to use after the first listen, and will definitely revisit it in the future to form better habits."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Atomic Habits is full of actionable advice"><meta property="og:description" content="I put the book to use after the first listen, and will definitely revisit it in the future to form better habits."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-03-12T06:19:31+00:00"><meta property="article:modified_time" content="2024-03-12T16:33:48+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Atomic Habits is full of actionable advice"><meta name=twitter:description content="I put the book to use after the first listen, and will definitely revisit it in the future to form better habits."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Atomic Habits is full of actionable advice","item":"https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Atomic Habits is full of actionable advice","name":"Atomic Habits is full of actionable advice","description":"I put the book to use after the first listen, and will definitely revisit it in the future to form better habits.","keywords":["books","career","personal","productivity"],"articleBody":"I recently picked up the Audible version of Atomic Habits, mostly because I kept seeing it everywhere. I wasn’t disappointed. It immediately gave me ideas for improving my routines based on a cursory listen. I will definitely revisit it and refer to resources like the habit cheat sheet in the future.\nAs a test, I used several tips to form better habits around sending messages (which is often a cause for procrastination):\nHabit stacking: Send one message after my morning workout (chained to an existing positive habit). Temptation bundling and reinforcement: After the message is out, have a snack. Tracking: Tick a box once the message is sent, which adds up to a lovely streak. …and it worked! It feels a bit odd consciously training myself like this, but it’s better than the alternative of sticking to bad habits and procrastination.\n","wordCount":"141","inLanguage":"en","datePublished":"2024-03-12T06:19:31Z","dateModified":"2024-03-12T16:33:48+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Atomic Habits is full of actionable advice</h1><div class=post-meta><span title='2024-03-12 06:19:31 +0000 UTC'>March 12, 2024</span></div></header><div class=post-content><p>I recently picked up the Audible version of <a href=https://jamesclear.com/atomic-habits target=_blank rel=noopener>Atomic Habits</a>, mostly because I kept seeing it everywhere. I wasn&rsquo;t disappointed. It immediately gave me ideas for improving my routines based on a cursory listen. I will definitely revisit it and refer to resources like <a href=https://s3.amazonaws.com/jamesclear/Atomic+Habits/Habits+Cheat+Sheet.pdf target=_blank rel=noopener>the habit cheat sheet</a> in the future.</p><p>As a test, I used several tips to form better habits around sending messages (which is often a cause for procrastination):</p><ul><li>Habit stacking: Send one message after my morning workout (chained to an existing positive habit).</li><li>Temptation bundling and reinforcement: After the message is out, have a snack.</li><li>Tracking: Tick a box once the message is sent, which adds up to a lovely streak.</li></ul><p>&mldr;and it worked! It feels a bit odd consciously training myself like this, but it&rsquo;s better than the alternative of sticking to bad habits and procrastination.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/personal/>Personal</a></li><li><a href=https://yanirseroussi.com/tags/productivity/>Productivity</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Atomic Habits is full of actionable advice on x" href="https://x.com/intent/tweet/?text=Atomic%20Habits%20is%20full%20of%20actionable%20advice&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f03%2f12%2fatomic-habits-is-full-of-actionable-advice%2f&amp;hashtags=books%2ccareer%2cpersonal%2cproductivity"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Atomic Habits is full of actionable advice on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f03%2f12%2fatomic-habits-is-full-of-actionable-advice%2f&amp;title=Atomic%20Habits%20is%20full%20of%20actionable%20advice&amp;summary=Atomic%20Habits%20is%20full%20of%20actionable%20advice&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f03%2f12%2fatomic-habits-is-full-of-actionable-advice%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Atomic Habits is full of actionable advice on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f03%2f12%2fatomic-habits-is-full-of-actionable-advice%2f&title=Atomic%20Habits%20is%20full%20of%20actionable%20advice"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Atomic Habits is full of actionable advice on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f03%2f12%2fatomic-habits-is-full-of-actionable-advice%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Atomic Habits is full of actionable advice on whatsapp" href="https://api.whatsapp.com/send?text=Atomic%20Habits%20is%20full%20of%20actionable%20advice%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f03%2f12%2fatomic-habits-is-full-of-actionable-advice%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Atomic Habits is full of actionable advice on telegram" href="https://telegram.me/share/url?text=Atomic%20Habits%20is%20full%20of%20actionable%20advice&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f03%2f12%2fatomic-habits-is-full-of-actionable-advice%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Atomic Habits is full of actionable advice on ycombinator" href="https://news.ycombinator.com/submitlink?t=Atomic%20Habits%20is%20full%20of%20actionable%20advice&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f03%2f12%2fatomic-habits-is-full-of-actionable-advice%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/index.html b/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/index.html
index a4bad069f..ea96cd991 100644
--- a/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/index.html
+++ b/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>The data engineering lifecycle is not going anywhere | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="books,career,data engineering,quotes"><meta name=description content="My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The data engineering lifecycle is not going anywhere"><meta property="og:description" content="My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-04-05T01:00:00+00:00"><meta property="article:modified_time" content="2024-04-05T11:23:38+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="The data engineering lifecycle is not going anywhere"><meta name=twitter:description content="My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"The data engineering lifecycle is not going anywhere","item":"https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The data engineering lifecycle is not going anywhere","name":"The data engineering lifecycle is not going anywhere","description":"My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley.","keywords":["books","career","data engineering","quotes"],"articleBody":"After over a decade of engaging in reluctant data engineering, I’ve come to terms with it being a part of my life for as long as I’m going to be working in Data \u0026 AI. As the rise of data engineering happened concurrently with my career in data, I picked up some data engineering skills to solve specific problems. That is, I learned from experience rather than from a formal education program. Recently, I read the book Fundamentals of Data Engineering by Joe Reis and Matt Housley, which has helped consolidate my understanding of the field – including key principles and trends.\nFundamentals of Data Engineering is an essential read for anyone working in data. As it aims to remain relevant for years to come, it doesn’t get into the details of specific tools, which are bound to keep changing. Instead, the book describes the data engineering lifecycle, provides guidelines for data architecture and technology choice, and then goes into details on each of the lifecycle stages.\nI will not attempt to summarise the book here, as it is itself a summary of a vast field. However, a couple of key items worth pulling out are the data engineering lifecycle and the principles of good data architecture (which apply to other software systems as well). In a world awash with vendor marketing and AI hype, it’s important to keep these abstractions in mind. In particular, the data engineering lifecycle is unlikely to go anywhere, even if more of the work will be done by AIs (aka automated agents).\nFive stages of the data engineering lifecycle:\nGeneration (typically outside the control of data engineers) Storage (underlies stages 3-5 – typically managed by data engineers) Ingestion (typically getting data from external production systems to a central warehouse/lake/lakehouse) Transformation (turning raw ingested data into more useful datasets) Serving (for analytics, machine learning, and reverse ETL – the latter means sending data back to production systems) Undercurrents of the data engineering lifecycle (which manifest in every stage):\nSecurity Data management DataOps Data architecture Orchestration Software engineering Principles of good data architecture:\nChoose common components wisely Plan for failure Architect for scalability Architecture is leadership Always be architecting Build loosely coupled systems Make reversible decisions Prioritize security Embrace FinOps ","wordCount":"374","inLanguage":"en","datePublished":"2024-04-05T01:00:00Z","dateModified":"2024-04-05T11:23:38+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">The data engineering lifecycle is not going anywhere</h1><div class=post-meta><span title='2024-04-05 01:00:00 +0000 UTC'>April 5, 2024</span></div></header><div class=post-content><p>After over a decade of engaging in <a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>reluctant data engineering</a>, I&rsquo;ve come to terms with it being a part of my life for as long as I&rsquo;m going to be working in Data & AI. As the rise of data engineering happened concurrently with my career in data, I picked up some data engineering skills to solve specific problems. That is, I learned from experience rather than from a formal education program. Recently, I read the book <a href=https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/ target=_blank rel=noopener>Fundamentals of Data Engineering</a> by Joe Reis and Matt Housley, which has helped consolidate my understanding of the field – including key principles and trends.</p><p>Fundamentals of Data Engineering is an essential read for anyone working in data. As it aims to remain relevant for years to come, it doesn&rsquo;t get into the details of specific tools, which are bound to keep changing. Instead, the book describes the data engineering lifecycle, provides guidelines for data architecture and technology choice, and then goes into details on each of the lifecycle stages.</p><p>I will not attempt to summarise the book here, as it is itself a summary of a vast field. However, a couple of key items worth pulling out are the data engineering lifecycle and the principles of good data architecture (which apply to other software systems as well). In a world awash with vendor marketing and AI hype, it&rsquo;s important to keep these abstractions in mind. In particular, the data engineering lifecycle is unlikely to go anywhere, even if more of the work will be done by AIs (aka <a href=https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/>automated agents</a>).</p><blockquote><p><strong>Five stages of the data engineering lifecycle:</strong></p><ol><li>Generation (typically outside the control of data engineers)</li><li>Storage (underlies stages 3-5 – typically managed by data engineers)</li><li>Ingestion (typically getting data from external production systems to a central warehouse/lake/lakehouse)</li><li>Transformation (turning raw ingested data into more useful datasets)</li><li>Serving (for analytics, machine learning, and reverse ETL – the latter means sending data back to production systems)</li></ol><p><strong>Undercurrents of the data engineering lifecycle (which manifest in every stage):</strong></p><ul><li>Security</li><li>Data management</li><li>DataOps</li><li>Data architecture</li><li>Orchestration</li><li>Software engineering</li></ul><p><strong>Principles of good data architecture:</strong></p><ol><li>Choose common components wisely</li><li>Plan for failure</li><li>Architect for scalability</li><li>Architecture is leadership</li><li>Always be architecting</li><li>Build loosely coupled systems</li><li>Make reversible decisions</li><li>Prioritize security</li><li>Embrace FinOps</li></ol></blockquote></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-engineering/>Data Engineering</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The data engineering lifecycle is not going anywhere on x" href="https://x.com/intent/tweet/?text=The%20data%20engineering%20lifecycle%20is%20not%20going%20anywhere&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f05%2fthe-data-engineering-lifecycle-is-not-going-anywhere%2f&amp;hashtags=books%2ccareer%2cdataengineering%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The data engineering lifecycle is not going anywhere on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f05%2fthe-data-engineering-lifecycle-is-not-going-anywhere%2f&amp;title=The%20data%20engineering%20lifecycle%20is%20not%20going%20anywhere&amp;summary=The%20data%20engineering%20lifecycle%20is%20not%20going%20anywhere&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f05%2fthe-data-engineering-lifecycle-is-not-going-anywhere%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The data engineering lifecycle is not going anywhere on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f05%2fthe-data-engineering-lifecycle-is-not-going-anywhere%2f&title=The%20data%20engineering%20lifecycle%20is%20not%20going%20anywhere"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The data engineering lifecycle is not going anywhere on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f05%2fthe-data-engineering-lifecycle-is-not-going-anywhere%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The data engineering lifecycle is not going anywhere on whatsapp" href="https://api.whatsapp.com/send?text=The%20data%20engineering%20lifecycle%20is%20not%20going%20anywhere%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f05%2fthe-data-engineering-lifecycle-is-not-going-anywhere%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The data engineering lifecycle is not going anywhere on telegram" href="https://telegram.me/share/url?text=The%20data%20engineering%20lifecycle%20is%20not%20going%20anywhere&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f05%2fthe-data-engineering-lifecycle-is-not-going-anywhere%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The data engineering lifecycle is not going anywhere on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20data%20engineering%20lifecycle%20is%20not%20going%20anywhere&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f05%2fthe-data-engineering-lifecycle-is-not-going-anywhere%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="books,career,data engineering,quotes"><meta name=description content="My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The data engineering lifecycle is not going anywhere"><meta property="og:description" content="My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-04-05T01:00:00+00:00"><meta property="article:modified_time" content="2024-04-05T11:23:38+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="The data engineering lifecycle is not going anywhere"><meta name=twitter:description content="My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"The data engineering lifecycle is not going anywhere","item":"https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The data engineering lifecycle is not going anywhere","name":"The data engineering lifecycle is not going anywhere","description":"My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley.","keywords":["books","career","data engineering","quotes"],"articleBody":"After over a decade of engaging in reluctant data engineering, I’ve come to terms with it being a part of my life for as long as I’m going to be working in Data \u0026 AI. As the rise of data engineering happened concurrently with my career in data, I picked up some data engineering skills to solve specific problems. That is, I learned from experience rather than from a formal education program. Recently, I read the book Fundamentals of Data Engineering by Joe Reis and Matt Housley, which has helped consolidate my understanding of the field – including key principles and trends.\nFundamentals of Data Engineering is an essential read for anyone working in data. As it aims to remain relevant for years to come, it doesn’t get into the details of specific tools, which are bound to keep changing. Instead, the book describes the data engineering lifecycle, provides guidelines for data architecture and technology choice, and then goes into details on each of the lifecycle stages.\nI will not attempt to summarise the book here, as it is itself a summary of a vast field. However, a couple of key items worth pulling out are the data engineering lifecycle and the principles of good data architecture (which apply to other software systems as well). In a world awash with vendor marketing and AI hype, it’s important to keep these abstractions in mind. In particular, the data engineering lifecycle is unlikely to go anywhere, even if more of the work will be done by AIs (aka automated agents).\nFive stages of the data engineering lifecycle:\nGeneration (typically outside the control of data engineers) Storage (underlies stages 3-5 – typically managed by data engineers) Ingestion (typically getting data from external production systems to a central warehouse/lake/lakehouse) Transformation (turning raw ingested data into more useful datasets) Serving (for analytics, machine learning, and reverse ETL – the latter means sending data back to production systems) Undercurrents of the data engineering lifecycle (which manifest in every stage):\nSecurity Data management DataOps Data architecture Orchestration Software engineering Principles of good data architecture:\nChoose common components wisely Plan for failure Architect for scalability Architecture is leadership Always be architecting Build loosely coupled systems Make reversible decisions Prioritize security Embrace FinOps ","wordCount":"374","inLanguage":"en","datePublished":"2024-04-05T01:00:00Z","dateModified":"2024-04-05T11:23:38+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">The data engineering lifecycle is not going anywhere</h1><div class=post-meta><span title='2024-04-05 01:00:00 +0000 UTC'>April 5, 2024</span></div></header><div class=post-content><p>After over a decade of engaging in <a href=https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/>reluctant data engineering</a>, I&rsquo;ve come to terms with it being a part of my life for as long as I&rsquo;m going to be working in Data & AI. As the rise of data engineering happened concurrently with my career in data, I picked up some data engineering skills to solve specific problems. That is, I learned from experience rather than from a formal education program. Recently, I read the book <a href=https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/ target=_blank rel=noopener>Fundamentals of Data Engineering</a> by Joe Reis and Matt Housley, which has helped consolidate my understanding of the field – including key principles and trends.</p><p>Fundamentals of Data Engineering is an essential read for anyone working in data. As it aims to remain relevant for years to come, it doesn&rsquo;t get into the details of specific tools, which are bound to keep changing. Instead, the book describes the data engineering lifecycle, provides guidelines for data architecture and technology choice, and then goes into details on each of the lifecycle stages.</p><p>I will not attempt to summarise the book here, as it is itself a summary of a vast field. However, a couple of key items worth pulling out are the data engineering lifecycle and the principles of good data architecture (which apply to other software systems as well). In a world awash with vendor marketing and AI hype, it&rsquo;s important to keep these abstractions in mind. In particular, the data engineering lifecycle is unlikely to go anywhere, even if more of the work will be done by AIs (aka <a href=https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/>automated agents</a>).</p><blockquote><p><strong>Five stages of the data engineering lifecycle:</strong></p><ol><li>Generation (typically outside the control of data engineers)</li><li>Storage (underlies stages 3-5 – typically managed by data engineers)</li><li>Ingestion (typically getting data from external production systems to a central warehouse/lake/lakehouse)</li><li>Transformation (turning raw ingested data into more useful datasets)</li><li>Serving (for analytics, machine learning, and reverse ETL – the latter means sending data back to production systems)</li></ol><p><strong>Undercurrents of the data engineering lifecycle (which manifest in every stage):</strong></p><ul><li>Security</li><li>Data management</li><li>DataOps</li><li>Data architecture</li><li>Orchestration</li><li>Software engineering</li></ul><p><strong>Principles of good data architecture:</strong></p><ol><li>Choose common components wisely</li><li>Plan for failure</li><li>Architect for scalability</li><li>Architecture is leadership</li><li>Always be architecting</li><li>Build loosely coupled systems</li><li>Make reversible decisions</li><li>Prioritize security</li><li>Embrace FinOps</li></ol></blockquote></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/data-engineering/>Data Engineering</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The data engineering lifecycle is not going anywhere on x" href="https://x.com/intent/tweet/?text=The%20data%20engineering%20lifecycle%20is%20not%20going%20anywhere&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f05%2fthe-data-engineering-lifecycle-is-not-going-anywhere%2f&amp;hashtags=books%2ccareer%2cdataengineering%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The data engineering lifecycle is not going anywhere on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f05%2fthe-data-engineering-lifecycle-is-not-going-anywhere%2f&amp;title=The%20data%20engineering%20lifecycle%20is%20not%20going%20anywhere&amp;summary=The%20data%20engineering%20lifecycle%20is%20not%20going%20anywhere&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f05%2fthe-data-engineering-lifecycle-is-not-going-anywhere%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The data engineering lifecycle is not going anywhere on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f05%2fthe-data-engineering-lifecycle-is-not-going-anywhere%2f&title=The%20data%20engineering%20lifecycle%20is%20not%20going%20anywhere"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The data engineering lifecycle is not going anywhere on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f05%2fthe-data-engineering-lifecycle-is-not-going-anywhere%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The data engineering lifecycle is not going anywhere on whatsapp" href="https://api.whatsapp.com/send?text=The%20data%20engineering%20lifecycle%20is%20not%20going%20anywhere%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f05%2fthe-data-engineering-lifecycle-is-not-going-anywhere%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The data engineering lifecycle is not going anywhere on telegram" href="https://telegram.me/share/url?text=The%20data%20engineering%20lifecycle%20is%20not%20going%20anywhere&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f05%2fthe-data-engineering-lifecycle-is-not-going-anywhere%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The data engineering lifecycle is not going anywhere on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20data%20engineering%20lifecycle%20is%20not%20going%20anywhere&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f05%2fthe-data-engineering-lifecycle-is-not-going-anywhere%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2024/04/11/linkedin-is-a-teachable-skill/index.html b/til/2024/04/11/linkedin-is-a-teachable-skill/index.html
index 06db37ddd..3380ded4f 100644
--- a/til/2024/04/11/linkedin-is-a-teachable-skill/index.html
+++ b/til/2024/04/11/linkedin-is-a-teachable-skill/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>LinkedIn is a teachable skill | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,career,LinkedIn,marketing"><meta name=description content="An high-level overview of things I learned from Justin Welsh&rsquo;s LinkedIn Operating System course."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="LinkedIn is a teachable skill"><meta property="og:description" content="An high-level overview of things I learned from Justin Welsh&rsquo;s LinkedIn Operating System course."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-04-11T01:45:25+00:00"><meta property="article:modified_time" content="2024-04-11T13:42:58+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="LinkedIn is a teachable skill"><meta name=twitter:description content="An high-level overview of things I learned from Justin Welsh&rsquo;s LinkedIn Operating System course."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"LinkedIn is a teachable skill","item":"https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"LinkedIn is a teachable skill","name":"LinkedIn is a teachable skill","description":"An high-level overview of things I learned from Justin Welsh\u0026rsquo;s LinkedIn Operating System course.","keywords":["business","career","LinkedIn","marketing"],"articleBody":"As I delve deeper into the world of solopreneurship, I’m learning many valuable lessons about running a solo business. Examples include:\nIt is possible to generate a sustainable income as a soloist, even if you’re not building a software product. Specialisation and differentiation are key to success (e.g., commodity data scientists are undifferentiated). It’s essential to cultivate 1:1 relationships and connections. Scaling trust through parasocial relationships is how many soloists create a steady pipeline of inbound leads (and it makes outreach easier). Good habits, systems, and processes increase the effectiveness of any skill. One person who exemplifies all of the above lessons is Justin Welsh, who is on the path to $10M in revenue through his soloist ventures. I came across Justin via the Fractionals United community, where many members are interested in getting better at LinkedIn.\nNow, it may sound weird to be talking about being “good at LinkedIn”, but I think it describes Justin well – just look at his feed and profile.\nThe beauty of being good at LinkedIn is that it’s a skill that’s highly visible. And unlike other people who merely possess LinkedIn skills, Justin has taken the next step to teach it to others. He offers an online course called the LinkedIn Operating System,1 which I just completed.\nInterestingly, despite being somewhat frugal in general, it didn’t take me long to decide to pay for the course. Justin’s visible skills and the course’s testimonials were helpful, and a 30% discount was helpful in getting me over the line (seems like the discounts change constantly). However, I was already a warm prospect because of my realisation that parasocial relationships are important, and because I already post on LinkedIn (I find it much more civil and valuable than other social networks). Therefore, I see getting better at LinkedIn as an investment in my business. Along with posting regularly on my website and mailing list, improving my LinkedIn visibility should help with attracting quality leads.\nI’m still far from implementing all of Justin’s advice (some of which would take months and years of deliberate practice), but here’s a quick outline of the core parts of the course along with my high-level thoughts.\nThe foundation: Defining your niche, and building an online persona through your backstory, polarisation, and stories. As noted above, having a well-defined niche is key. I’m still working on sharpening mine, but I’m definitely further along than where I was before I fully recognised the need for specialisation. I’m not a fan of online polarisation (one of the reasons I deleted my Twitter account), so I translated polarisation to strong opinions, which is something I definitely have. As a CEO at a startup I worked with once said: “you have the rare qualities of a) very good with data and b) very able to express your opinions – especially when you don’t agree with how something is being done.” In fact, the realisation that as a soloist I don’t have to hide my opinions to fit in with the norms of a single company is quite liberating. Content creation: Relaying content through leading/discovering/reporting, and systematising content creation and publication. It’s helpful to think of content as falling into the three broad categories of leading, discovering, and reporting. In some areas, I’m comfortable leading based on my experience. In others, I’m happy to discover new things and share them as I go. Reporting is something I’d like to do more of – perhaps in the form of podcast interviews down the track. Again, systems and habits are key to success. Even people who have a natural talent for LinkedIn content creation won’t get anywhere if they don’t post consistently. Building your tribe: Smart audience interaction and finding relevant people. This part is also about systems that are fairly straightforward to implement. That’s partly why LinkedIn is a teachable skill: Most people don’t bother systematising audience interaction and growth. At its core, this is about human relationships (which I believe will take longer to automate than other skills that will be taken over by AI). LinkedIn lead capture: Building the profile funnel and hero section, telling prospects what you do, providing social proof, and ending with a call to action. This includes parts that I already knew I wasn’t doing great (like having a generic cover photo). Now I’m more motivated to improve my profile, and have a checklist of items to address. It’s surprising how many people have profiles that can be easily improved – including many accomplished professionals. This highlights how LinkedIn profile building is a skillset that’s orthogonal to one’s technical career skills. Business workflow: Inbound strategy, outbound strategy, and selling on LinkedIn. Once again, full of helpful tips and actionable advice that I will be implementing. Like much of the better business and marketing advice out there, the emphasis is on building trust and relationships and providing value, rather than on creepy sales tactics. Doing this well takes time and a conscious effort. If you’re in a place where you’d benefit from learning how to LinkedIn, definitely check out the course. Even though it’s a few years old at this point, it’s still highly relevant and recommended. I will be revisiting it in the future.\nThat’s an affiliate link to the course, so I get a cut if you sign up through the link. It’s probably marketing 101 to incentivise people to generate referrals, but I wouldn’t be publishing this post if I didn’t think the course was valuable. ↩︎\n","wordCount":"916","inLanguage":"en","datePublished":"2024-04-11T01:45:25Z","dateModified":"2024-04-11T13:42:58+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">LinkedIn is a teachable skill</h1><div class=post-meta><span title='2024-04-11 01:45:25 +0000 UTC'>April 11, 2024</span></div></header><div class=post-content><p>As I delve deeper into the world of solopreneurship, I&rsquo;m learning many valuable lessons about running a solo business. Examples include:</p><ul><li>It is possible to generate a sustainable income as a soloist, even if you&rsquo;re not <a href=https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/>building a software product</a>.</li><li><a href=https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/>Specialisation</a> and differentiation are key to success (e.g., <a href=https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/>commodity data scientists</a> are undifferentiated).</li><li>It&rsquo;s essential to cultivate 1:1 relationships and <a href=https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/>connections</a>.</li><li><a href=https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/>Scaling trust through parasocial relationships</a> is how many soloists create a steady pipeline of inbound leads (and it makes outreach easier).</li><li><a href=https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/>Good habits</a>, systems, and processes increase the effectiveness of any skill.</li></ul><p>One person who exemplifies all of the above lessons is <a href=https://www.justinwelsh.me/ target=_blank rel=noopener>Justin Welsh</a>, who is on the path to $10M in revenue through his soloist ventures. I came across Justin via the <a href=https://www.fractionalsunited.com/ target=_blank rel=noopener>Fractionals United community</a>, where many members are interested in getting better at LinkedIn.</p><p>Now, it may sound weird to be talking about being <em>&ldquo;good at LinkedIn&rdquo;</em>, but I think it describes Justin well – just look at his <a href=https://www.linkedin.com/in/justinwelsh/recent-activity/all/ target=_blank rel=noopener>feed</a> and <a href=https://www.linkedin.com/in/justinwelsh/ target=_blank rel=noopener>profile</a>.</p><p>The beauty of being good at LinkedIn is that it&rsquo;s a skill that&rsquo;s highly visible. And unlike other people who merely possess LinkedIn skills, Justin has taken the next step to teach it to others. He offers <a href=https://learn.justinwelsh.me/a/2147505019/fPm7F4Xu target=_blank rel=noopener>an online course called the LinkedIn Operating System</a>,<sup id=fnref:1><a href=#fn:1 class=footnote-ref role=doc-noteref>1</a></sup> which I just completed.</p><p>Interestingly, despite being somewhat frugal in general, it didn&rsquo;t take me long to decide to pay for the course. Justin&rsquo;s visible skills and the course&rsquo;s testimonials were helpful, and a 30% discount was helpful in getting me over the line (seems like the discounts change constantly). However, I was already a warm prospect because of my realisation that parasocial relationships are important, and because I already post on LinkedIn (I find it much more civil and valuable than other social networks). Therefore, I see getting better at LinkedIn as an investment in my business. Along with <a href=https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/>posting regularly on my website and mailing list</a>, improving my LinkedIn visibility should help with attracting quality leads.</p><p>I&rsquo;m still far from implementing all of Justin&rsquo;s advice (some of which would take months and years of deliberate practice), but here&rsquo;s a quick outline of the core parts of the course along with my high-level thoughts.</p><ol><li><strong>The foundation:</strong> Defining your niche, and building an online persona through your backstory, polarisation, and stories.<ul><li>As noted above, having a well-defined niche is key. I&rsquo;m still working on sharpening mine, but I&rsquo;m definitely further along than where I was before I fully recognised the need for specialisation.</li><li>I&rsquo;m not a fan of online polarisation (one of the reasons I deleted my Twitter account), so I translated <em>polarisation</em> to <em>strong opinions</em>, which is something I definitely have. As a CEO at a startup I worked with once said: <em>&ldquo;you have the rare qualities of a) very good with data and b) very able to express your opinions – especially when you don&rsquo;t agree with how something is being done.&rdquo;</em> In fact, the realisation that as a soloist I don&rsquo;t have to hide my opinions to fit in with the norms of a single company is quite liberating.</li></ul></li><li><strong>Content creation:</strong> Relaying content through leading/discovering/reporting, and systematising content creation and publication.<ul><li>It&rsquo;s helpful to think of content as falling into the three broad categories of leading, discovering, and reporting. In some areas, I&rsquo;m comfortable leading based on my experience. In others, I&rsquo;m happy to discover new things and share them as I go. Reporting is something I&rsquo;d like to do more of – perhaps in the form of podcast interviews down the track.</li><li>Again, systems and habits are key to success. Even people who have a natural talent for LinkedIn content creation won&rsquo;t get anywhere if they don&rsquo;t post consistently.</li></ul></li><li><strong>Building your tribe:</strong> Smart audience interaction and finding relevant people.<ul><li>This part is also about systems that are fairly straightforward to implement. That&rsquo;s partly why LinkedIn is a teachable skill: Most people don&rsquo;t bother systematising audience interaction and growth. At its core, this is about human relationships (which I believe will take longer to automate than <a href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/>other skills that will be taken over by AI</a>).</li></ul></li><li><strong>LinkedIn lead capture:</strong> Building the profile funnel and hero section, telling prospects what you do, providing social proof, and ending with a call to action.<ul><li>This includes parts that I already knew I wasn&rsquo;t doing great (like having a generic cover photo). Now I&rsquo;m more motivated to improve my profile, and have a checklist of items to address. It&rsquo;s surprising how many people have profiles that can be easily improved – including many accomplished professionals. This highlights how LinkedIn profile building is a skillset that&rsquo;s orthogonal to one&rsquo;s technical career skills.</li></ul></li><li><strong>Business workflow:</strong> Inbound strategy, outbound strategy, and selling on LinkedIn.<ul><li>Once again, full of helpful tips and actionable advice that I will be implementing. Like much of the better business and marketing advice out there, the emphasis is on building trust and relationships and providing value, rather than on creepy sales tactics. Doing this well takes time and a conscious effort.</li></ul></li></ol><p>If you&rsquo;re in a place where you&rsquo;d benefit from learning how to LinkedIn, definitely check out <a href=https://learn.justinwelsh.me/a/2147505019/fPm7F4Xu target=_blank rel=noopener>the course</a>. Even though it&rsquo;s a few years old at this point, it&rsquo;s still highly relevant and recommended. I will be revisiting it in the future.</p><div class=footnotes role=doc-endnotes><hr><ol><li id=fn:1><p>That&rsquo;s an affiliate link to the course, so I get a cut if you sign up through the link. It&rsquo;s probably marketing 101 to incentivise people to generate referrals, but I wouldn&rsquo;t be publishing this post if I didn&rsquo;t think the course was valuable.&#160;<a href=#fnref:1 class=footnote-backref role=doc-backlink>&#8617;&#xfe0e;</a></p></li></ol></div></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/linkedin/>LinkedIn</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share LinkedIn is a teachable skill on x" href="https://x.com/intent/tweet/?text=LinkedIn%20is%20a%20teachable%20skill&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f11%2flinkedin-is-a-teachable-skill%2f&amp;hashtags=business%2ccareer%2cLinkedIn%2cmarketing"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share LinkedIn is a teachable skill on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f11%2flinkedin-is-a-teachable-skill%2f&amp;title=LinkedIn%20is%20a%20teachable%20skill&amp;summary=LinkedIn%20is%20a%20teachable%20skill&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f11%2flinkedin-is-a-teachable-skill%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share LinkedIn is a teachable skill on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f11%2flinkedin-is-a-teachable-skill%2f&title=LinkedIn%20is%20a%20teachable%20skill"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share LinkedIn is a teachable skill on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f11%2flinkedin-is-a-teachable-skill%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share LinkedIn is a teachable skill on whatsapp" href="https://api.whatsapp.com/send?text=LinkedIn%20is%20a%20teachable%20skill%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f11%2flinkedin-is-a-teachable-skill%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share LinkedIn is a teachable skill on telegram" href="https://telegram.me/share/url?text=LinkedIn%20is%20a%20teachable%20skill&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f11%2flinkedin-is-a-teachable-skill%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share LinkedIn is a teachable skill on ycombinator" href="https://news.ycombinator.com/submitlink?t=LinkedIn%20is%20a%20teachable%20skill&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f11%2flinkedin-is-a-teachable-skill%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="business,career,LinkedIn,marketing"><meta name=description content="An high-level overview of things I learned from Justin Welsh&rsquo;s LinkedIn Operating System course."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="LinkedIn is a teachable skill"><meta property="og:description" content="An high-level overview of things I learned from Justin Welsh&rsquo;s LinkedIn Operating System course."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-04-11T01:45:25+00:00"><meta property="article:modified_time" content="2024-04-11T13:42:58+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="LinkedIn is a teachable skill"><meta name=twitter:description content="An high-level overview of things I learned from Justin Welsh&rsquo;s LinkedIn Operating System course."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"LinkedIn is a teachable skill","item":"https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"LinkedIn is a teachable skill","name":"LinkedIn is a teachable skill","description":"An high-level overview of things I learned from Justin Welsh\u0026rsquo;s LinkedIn Operating System course.","keywords":["business","career","LinkedIn","marketing"],"articleBody":"As I delve deeper into the world of solopreneurship, I’m learning many valuable lessons about running a solo business. Examples include:\nIt is possible to generate a sustainable income as a soloist, even if you’re not building a software product. Specialisation and differentiation are key to success (e.g., commodity data scientists are undifferentiated). It’s essential to cultivate 1:1 relationships and connections. Scaling trust through parasocial relationships is how many soloists create a steady pipeline of inbound leads (and it makes outreach easier). Good habits, systems, and processes increase the effectiveness of any skill. One person who exemplifies all of the above lessons is Justin Welsh, who is on the path to $10M in revenue through his soloist ventures. I came across Justin via the Fractionals United community, where many members are interested in getting better at LinkedIn.\nNow, it may sound weird to be talking about being “good at LinkedIn”, but I think it describes Justin well – just look at his feed and profile.\nThe beauty of being good at LinkedIn is that it’s a skill that’s highly visible. And unlike other people who merely possess LinkedIn skills, Justin has taken the next step to teach it to others. He offers an online course called the LinkedIn Operating System,1 which I just completed.\nInterestingly, despite being somewhat frugal in general, it didn’t take me long to decide to pay for the course. Justin’s visible skills and the course’s testimonials were helpful, and a 30% discount was helpful in getting me over the line (seems like the discounts change constantly). However, I was already a warm prospect because of my realisation that parasocial relationships are important, and because I already post on LinkedIn (I find it much more civil and valuable than other social networks). Therefore, I see getting better at LinkedIn as an investment in my business. Along with posting regularly on my website and mailing list, improving my LinkedIn visibility should help with attracting quality leads.\nI’m still far from implementing all of Justin’s advice (some of which would take months and years of deliberate practice), but here’s a quick outline of the core parts of the course along with my high-level thoughts.\nThe foundation: Defining your niche, and building an online persona through your backstory, polarisation, and stories. As noted above, having a well-defined niche is key. I’m still working on sharpening mine, but I’m definitely further along than where I was before I fully recognised the need for specialisation. I’m not a fan of online polarisation (one of the reasons I deleted my Twitter account), so I translated polarisation to strong opinions, which is something I definitely have. As a CEO at a startup I worked with once said: “you have the rare qualities of a) very good with data and b) very able to express your opinions – especially when you don’t agree with how something is being done.” In fact, the realisation that as a soloist I don’t have to hide my opinions to fit in with the norms of a single company is quite liberating. Content creation: Relaying content through leading/discovering/reporting, and systematising content creation and publication. It’s helpful to think of content as falling into the three broad categories of leading, discovering, and reporting. In some areas, I’m comfortable leading based on my experience. In others, I’m happy to discover new things and share them as I go. Reporting is something I’d like to do more of – perhaps in the form of podcast interviews down the track. Again, systems and habits are key to success. Even people who have a natural talent for LinkedIn content creation won’t get anywhere if they don’t post consistently. Building your tribe: Smart audience interaction and finding relevant people. This part is also about systems that are fairly straightforward to implement. That’s partly why LinkedIn is a teachable skill: Most people don’t bother systematising audience interaction and growth. At its core, this is about human relationships (which I believe will take longer to automate than other skills that will be taken over by AI). LinkedIn lead capture: Building the profile funnel and hero section, telling prospects what you do, providing social proof, and ending with a call to action. This includes parts that I already knew I wasn’t doing great (like having a generic cover photo). Now I’m more motivated to improve my profile, and have a checklist of items to address. It’s surprising how many people have profiles that can be easily improved – including many accomplished professionals. This highlights how LinkedIn profile building is a skillset that’s orthogonal to one’s technical career skills. Business workflow: Inbound strategy, outbound strategy, and selling on LinkedIn. Once again, full of helpful tips and actionable advice that I will be implementing. Like much of the better business and marketing advice out there, the emphasis is on building trust and relationships and providing value, rather than on creepy sales tactics. Doing this well takes time and a conscious effort. If you’re in a place where you’d benefit from learning how to LinkedIn, definitely check out the course. Even though it’s a few years old at this point, it’s still highly relevant and recommended. I will be revisiting it in the future.\nThat’s an affiliate link to the course, so I get a cut if you sign up through the link. It’s probably marketing 101 to incentivise people to generate referrals, but I wouldn’t be publishing this post if I didn’t think the course was valuable. ↩︎\n","wordCount":"916","inLanguage":"en","datePublished":"2024-04-11T01:45:25Z","dateModified":"2024-04-11T13:42:58+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">LinkedIn is a teachable skill</h1><div class=post-meta><span title='2024-04-11 01:45:25 +0000 UTC'>April 11, 2024</span></div></header><div class=post-content><p>As I delve deeper into the world of solopreneurship, I&rsquo;m learning many valuable lessons about running a solo business. Examples include:</p><ul><li>It is possible to generate a sustainable income as a soloist, even if you&rsquo;re not <a href=https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/>building a software product</a>.</li><li><a href=https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/>Specialisation</a> and differentiation are key to success (e.g., <a href=https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/>commodity data scientists</a> are undifferentiated).</li><li>It&rsquo;s essential to cultivate 1:1 relationships and <a href=https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/>connections</a>.</li><li><a href=https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/>Scaling trust through parasocial relationships</a> is how many soloists create a steady pipeline of inbound leads (and it makes outreach easier).</li><li><a href=https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/>Good habits</a>, systems, and processes increase the effectiveness of any skill.</li></ul><p>One person who exemplifies all of the above lessons is <a href=https://www.justinwelsh.me/ target=_blank rel=noopener>Justin Welsh</a>, who is on the path to $10M in revenue through his soloist ventures. I came across Justin via the <a href=https://www.fractionalsunited.com/ target=_blank rel=noopener>Fractionals United community</a>, where many members are interested in getting better at LinkedIn.</p><p>Now, it may sound weird to be talking about being <em>&ldquo;good at LinkedIn&rdquo;</em>, but I think it describes Justin well – just look at his <a href=https://www.linkedin.com/in/justinwelsh/recent-activity/all/ target=_blank rel=noopener>feed</a> and <a href=https://www.linkedin.com/in/justinwelsh/ target=_blank rel=noopener>profile</a>.</p><p>The beauty of being good at LinkedIn is that it&rsquo;s a skill that&rsquo;s highly visible. And unlike other people who merely possess LinkedIn skills, Justin has taken the next step to teach it to others. He offers <a href=https://learn.justinwelsh.me/a/2147505019/fPm7F4Xu target=_blank rel=noopener>an online course called the LinkedIn Operating System</a>,<sup id=fnref:1><a href=#fn:1 class=footnote-ref role=doc-noteref>1</a></sup> which I just completed.</p><p>Interestingly, despite being somewhat frugal in general, it didn&rsquo;t take me long to decide to pay for the course. Justin&rsquo;s visible skills and the course&rsquo;s testimonials were helpful, and a 30% discount was helpful in getting me over the line (seems like the discounts change constantly). However, I was already a warm prospect because of my realisation that parasocial relationships are important, and because I already post on LinkedIn (I find it much more civil and valuable than other social networks). Therefore, I see getting better at LinkedIn as an investment in my business. Along with <a href=https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/>posting regularly on my website and mailing list</a>, improving my LinkedIn visibility should help with attracting quality leads.</p><p>I&rsquo;m still far from implementing all of Justin&rsquo;s advice (some of which would take months and years of deliberate practice), but here&rsquo;s a quick outline of the core parts of the course along with my high-level thoughts.</p><ol><li><strong>The foundation:</strong> Defining your niche, and building an online persona through your backstory, polarisation, and stories.<ul><li>As noted above, having a well-defined niche is key. I&rsquo;m still working on sharpening mine, but I&rsquo;m definitely further along than where I was before I fully recognised the need for specialisation.</li><li>I&rsquo;m not a fan of online polarisation (one of the reasons I deleted my Twitter account), so I translated <em>polarisation</em> to <em>strong opinions</em>, which is something I definitely have. As a CEO at a startup I worked with once said: <em>&ldquo;you have the rare qualities of a) very good with data and b) very able to express your opinions – especially when you don&rsquo;t agree with how something is being done.&rdquo;</em> In fact, the realisation that as a soloist I don&rsquo;t have to hide my opinions to fit in with the norms of a single company is quite liberating.</li></ul></li><li><strong>Content creation:</strong> Relaying content through leading/discovering/reporting, and systematising content creation and publication.<ul><li>It&rsquo;s helpful to think of content as falling into the three broad categories of leading, discovering, and reporting. In some areas, I&rsquo;m comfortable leading based on my experience. In others, I&rsquo;m happy to discover new things and share them as I go. Reporting is something I&rsquo;d like to do more of – perhaps in the form of podcast interviews down the track.</li><li>Again, systems and habits are key to success. Even people who have a natural talent for LinkedIn content creation won&rsquo;t get anywhere if they don&rsquo;t post consistently.</li></ul></li><li><strong>Building your tribe:</strong> Smart audience interaction and finding relevant people.<ul><li>This part is also about systems that are fairly straightforward to implement. That&rsquo;s partly why LinkedIn is a teachable skill: Most people don&rsquo;t bother systematising audience interaction and growth. At its core, this is about human relationships (which I believe will take longer to automate than <a href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/>other skills that will be taken over by AI</a>).</li></ul></li><li><strong>LinkedIn lead capture:</strong> Building the profile funnel and hero section, telling prospects what you do, providing social proof, and ending with a call to action.<ul><li>This includes parts that I already knew I wasn&rsquo;t doing great (like having a generic cover photo). Now I&rsquo;m more motivated to improve my profile, and have a checklist of items to address. It&rsquo;s surprising how many people have profiles that can be easily improved – including many accomplished professionals. This highlights how LinkedIn profile building is a skillset that&rsquo;s orthogonal to one&rsquo;s technical career skills.</li></ul></li><li><strong>Business workflow:</strong> Inbound strategy, outbound strategy, and selling on LinkedIn.<ul><li>Once again, full of helpful tips and actionable advice that I will be implementing. Like much of the better business and marketing advice out there, the emphasis is on building trust and relationships and providing value, rather than on creepy sales tactics. Doing this well takes time and a conscious effort.</li></ul></li></ol><p>If you&rsquo;re in a place where you&rsquo;d benefit from learning how to LinkedIn, definitely check out <a href=https://learn.justinwelsh.me/a/2147505019/fPm7F4Xu target=_blank rel=noopener>the course</a>. Even though it&rsquo;s a few years old at this point, it&rsquo;s still highly relevant and recommended. I will be revisiting it in the future.</p><div class=footnotes role=doc-endnotes><hr><ol><li id=fn:1><p>That&rsquo;s an affiliate link to the course, so I get a cut if you sign up through the link. It&rsquo;s probably marketing 101 to incentivise people to generate referrals, but I wouldn&rsquo;t be publishing this post if I didn&rsquo;t think the course was valuable.&#160;<a href=#fnref:1 class=footnote-backref role=doc-backlink>&#8617;&#xfe0e;</a></p></li></ol></div></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/linkedin/>LinkedIn</a></li><li><a href=https://yanirseroussi.com/tags/marketing/>Marketing</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share LinkedIn is a teachable skill on x" href="https://x.com/intent/tweet/?text=LinkedIn%20is%20a%20teachable%20skill&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f11%2flinkedin-is-a-teachable-skill%2f&amp;hashtags=business%2ccareer%2cLinkedIn%2cmarketing"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share LinkedIn is a teachable skill on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f11%2flinkedin-is-a-teachable-skill%2f&amp;title=LinkedIn%20is%20a%20teachable%20skill&amp;summary=LinkedIn%20is%20a%20teachable%20skill&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f11%2flinkedin-is-a-teachable-skill%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share LinkedIn is a teachable skill on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f11%2flinkedin-is-a-teachable-skill%2f&title=LinkedIn%20is%20a%20teachable%20skill"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share LinkedIn is a teachable skill on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f11%2flinkedin-is-a-teachable-skill%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share LinkedIn is a teachable skill on whatsapp" href="https://api.whatsapp.com/send?text=LinkedIn%20is%20a%20teachable%20skill%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f11%2flinkedin-is-a-teachable-skill%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share LinkedIn is a teachable skill on telegram" href="https://telegram.me/share/url?text=LinkedIn%20is%20a%20teachable%20skill&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f11%2flinkedin-is-a-teachable-skill%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share LinkedIn is a teachable skill on ycombinator" href="https://news.ycombinator.com/submitlink?t=LinkedIn%20is%20a%20teachable%20skill&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f04%2f11%2flinkedin-is-a-teachable-skill%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2024/05/25/adapting-to-the-economy-of-algorithms/index.html b/til/2024/05/25/adapting-to-the-economy-of-algorithms/index.html
index d25bb6ab4..51bffa2cd 100644
--- a/til/2024/05/25/adapting-to-the-economy-of-algorithms/index.html
+++ b/til/2024/05/25/adapting-to-the-economy-of-algorithms/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Adapting to the economy of algorithms | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,books,business,career,futurism,quotes"><meta name=description content="Overview of the book The Economy of Algorithms by Marek Kowalkiewicz."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Adapting to the economy of algorithms"><meta property="og:description" content="Overview of the book The Economy of Algorithms by Marek Kowalkiewicz."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-05-25T00:00:00+00:00"><meta property="article:modified_time" content="2024-05-25T10:00:56+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Adapting to the economy of algorithms"><meta name=twitter:description content="Overview of the book The Economy of Algorithms by Marek Kowalkiewicz."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Adapting to the economy of algorithms","item":"https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Adapting to the economy of algorithms","name":"Adapting to the economy of algorithms","description":"Overview of the book The Economy of Algorithms by Marek Kowalkiewicz.","keywords":["artificial intelligence","books","business","career","futurism","quotes"],"articleBody":"I recently read The Economy of Algorithms: AI and the Rise of the Digital Minions, by Marek Kowalkiewicz. It’s a light read that is mostly aimed at business leaders who are looking to adapt to our current age of increasing algorithmic automation (aka AI). However, some of the points are relevant to any human – especially to knowledge workers.\nThe main message of the book is that we’re experiencing an economic transition between three economies:\nEconomy of Corporations: What we had in the 20th century, with corporations being the most powerful entities. Economy of People: Emerged in the early 21st century, with individuals gaining agency that has enabled us to compete with corporations (e.g., with YouTube influencers becoming more popular than some TV channels). Economy of Algorithms: Starting in recent years, algorithms have been gaining agency and competing with people and corporations. The emergence of a new economy doesn’t immediately obviate the previous economies, as Marek explains:\nIn the economy of algorithms, corporations can leverage advanced algorithms to optimise their operations, enhance decision-making and drive innovation, which increases their competitiveness and profitability. Individuals can harness the power of algorithms to develop new skills, create innovative products or services and participate in emerging markets that were once inaccessible to them. And, as you’ll soon find out, algorithms themselves – without active human or corporate control – can buy, sell, collect and invest funds, and perform other activities that were previously reserved for businesses and people.\nThe book includes a rich set of examples and stories that illustrate the above points. As everyone is wondering what to do in this period of accelerating change, Marek proposes nine rules for our age. These are grouped under three areas:\nBe the minion master: Automate revenue generation Automate relentlessly but mindfully Build an army of digital minions Empower your people Be relentlessly curious: Evolve continuously Launch new value propositions Start a digital evolution Stay curious Be boldly optimistic: Saturate relationships with customers Maximise customer value Build digital ecosystems Create a bold future Marek calls corporations that do this well RACERS, which stands for Revenue Automation, Continuous Evolution, and Relationship Saturation.\nWhile some rules only apply to business leaders, a few also apply to humans who are trying to remain economically relevant – especially the rules around embracing automation and evolving continuously. The alternative is to be made redundant. For example, I recently saw a job ad for a copywriter position that required both the ability to produce excellent copy without the aid of AI, and also the ability to produce excellent copy four times faster with AI tools. The same applies – explicitly or implicitly – to any knowledge work that can now be sped up with automation.\n","wordCount":"453","inLanguage":"en","datePublished":"2024-05-25T00:00:00Z","dateModified":"2024-05-25T10:00:56+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Adapting to the economy of algorithms</h1><div class=post-meta><span title='2024-05-25 00:00:00 +0000 UTC'>May 25, 2024</span></div></header><div class=post-content><p>I recently read <a href=https://www.blackincbooks.com.au/books/economy-algorithms target=_blank rel=noopener>The Economy of Algorithms: AI and the Rise of the Digital Minions, by Marek Kowalkiewicz</a>. It&rsquo;s a light read that is mostly aimed at business leaders who are looking to adapt to our current age of <a href=https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/>increasing algorithmic automation (aka AI)</a>. However, some of the points are relevant to any human – especially to knowledge workers.</p><p>The main message of the book is that we&rsquo;re experiencing an economic transition between three economies:</p><ol><li><strong>Economy of Corporations</strong>: What we had in the 20th century, with corporations being the most powerful entities.</li><li><strong>Economy of People</strong>: Emerged in the early 21st century, with individuals gaining agency that has enabled us to compete with corporations (e.g., with YouTube influencers becoming more popular than some TV channels).</li><li><strong>Economy of Algorithms</strong>: Starting in recent years, algorithms have been gaining agency and competing with people and corporations.</li></ol><p>The emergence of a new economy doesn&rsquo;t immediately obviate the previous economies, as Marek explains:</p><blockquote><p>In the economy of algorithms, corporations can leverage advanced algorithms to optimise their operations, enhance decision-making and drive innovation, which increases their competitiveness and profitability. Individuals can harness the power of algorithms to develop new skills, create innovative products or services and participate in emerging markets that were once inaccessible to them. And, as you&rsquo;ll soon find out, algorithms themselves – without active human or corporate control – can buy, sell, collect and invest funds, and perform other activities that were previously reserved for businesses and people.</p></blockquote><p>The book includes a rich set of examples and stories that illustrate the above points. As everyone is wondering what to do in this period of accelerating change, Marek proposes nine rules for our age. These are grouped under three areas:</p><ol><li><strong>Be the minion master: Automate revenue generation</strong><ul><li>Automate relentlessly but mindfully</li><li>Build an army of digital minions</li><li>Empower your people</li></ul></li><li><strong>Be relentlessly curious: Evolve continuously</strong><ul><li>Launch new value propositions</li><li>Start a digital evolution</li><li>Stay curious</li></ul></li><li><strong>Be boldly optimistic: Saturate relationships with customers</strong><ul><li>Maximise customer value</li><li>Build digital ecosystems</li><li>Create a bold future</li></ul></li></ol><p>Marek calls corporations that do this well <em>RACERS</em>, which stands for Revenue Automation, Continuous Evolution, and Relationship Saturation.</p><p>While some rules only apply to business leaders, a few also apply to <a href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/>humans who are trying to remain economically relevant</a> – especially the rules around embracing automation and evolving continuously. The alternative is to be made redundant. For example, I recently saw a job ad for a copywriter position that required both the ability to produce excellent copy without the aid of AI, <em>and also</em> the ability to produce excellent copy four times faster with AI tools. The same applies – explicitly or implicitly – to any knowledge work that can now be sped up with automation.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/futurism/>Futurism</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Adapting to the economy of algorithms on x" href="https://x.com/intent/tweet/?text=Adapting%20to%20the%20economy%20of%20algorithms&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f05%2f25%2fadapting-to-the-economy-of-algorithms%2f&amp;hashtags=artificialintelligence%2cbooks%2cbusiness%2ccareer%2cfuturism%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Adapting to the economy of algorithms on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f05%2f25%2fadapting-to-the-economy-of-algorithms%2f&amp;title=Adapting%20to%20the%20economy%20of%20algorithms&amp;summary=Adapting%20to%20the%20economy%20of%20algorithms&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f05%2f25%2fadapting-to-the-economy-of-algorithms%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Adapting to the economy of algorithms on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f05%2f25%2fadapting-to-the-economy-of-algorithms%2f&title=Adapting%20to%20the%20economy%20of%20algorithms"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Adapting to the economy of algorithms on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f05%2f25%2fadapting-to-the-economy-of-algorithms%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Adapting to the economy of algorithms on whatsapp" href="https://api.whatsapp.com/send?text=Adapting%20to%20the%20economy%20of%20algorithms%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f05%2f25%2fadapting-to-the-economy-of-algorithms%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Adapting to the economy of algorithms on telegram" href="https://telegram.me/share/url?text=Adapting%20to%20the%20economy%20of%20algorithms&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f05%2f25%2fadapting-to-the-economy-of-algorithms%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Adapting to the economy of algorithms on ycombinator" href="https://news.ycombinator.com/submitlink?t=Adapting%20to%20the%20economy%20of%20algorithms&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f05%2f25%2fadapting-to-the-economy-of-algorithms%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="artificial intelligence,books,business,career,futurism,quotes"><meta name=description content="Overview of the book The Economy of Algorithms by Marek Kowalkiewicz."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Adapting to the economy of algorithms"><meta property="og:description" content="Overview of the book The Economy of Algorithms by Marek Kowalkiewicz."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-05-25T00:00:00+00:00"><meta property="article:modified_time" content="2024-05-25T10:00:56+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Adapting to the economy of algorithms"><meta name=twitter:description content="Overview of the book The Economy of Algorithms by Marek Kowalkiewicz."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Adapting to the economy of algorithms","item":"https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Adapting to the economy of algorithms","name":"Adapting to the economy of algorithms","description":"Overview of the book The Economy of Algorithms by Marek Kowalkiewicz.","keywords":["artificial intelligence","books","business","career","futurism","quotes"],"articleBody":"I recently read The Economy of Algorithms: AI and the Rise of the Digital Minions, by Marek Kowalkiewicz. It’s a light read that is mostly aimed at business leaders who are looking to adapt to our current age of increasing algorithmic automation (aka AI). However, some of the points are relevant to any human – especially to knowledge workers.\nThe main message of the book is that we’re experiencing an economic transition between three economies:\nEconomy of Corporations: What we had in the 20th century, with corporations being the most powerful entities. Economy of People: Emerged in the early 21st century, with individuals gaining agency that has enabled us to compete with corporations (e.g., with YouTube influencers becoming more popular than some TV channels). Economy of Algorithms: Starting in recent years, algorithms have been gaining agency and competing with people and corporations. The emergence of a new economy doesn’t immediately obviate the previous economies, as Marek explains:\nIn the economy of algorithms, corporations can leverage advanced algorithms to optimise their operations, enhance decision-making and drive innovation, which increases their competitiveness and profitability. Individuals can harness the power of algorithms to develop new skills, create innovative products or services and participate in emerging markets that were once inaccessible to them. And, as you’ll soon find out, algorithms themselves – without active human or corporate control – can buy, sell, collect and invest funds, and perform other activities that were previously reserved for businesses and people.\nThe book includes a rich set of examples and stories that illustrate the above points. As everyone is wondering what to do in this period of accelerating change, Marek proposes nine rules for our age. These are grouped under three areas:\nBe the minion master: Automate revenue generation Automate relentlessly but mindfully Build an army of digital minions Empower your people Be relentlessly curious: Evolve continuously Launch new value propositions Start a digital evolution Stay curious Be boldly optimistic: Saturate relationships with customers Maximise customer value Build digital ecosystems Create a bold future Marek calls corporations that do this well RACERS, which stands for Revenue Automation, Continuous Evolution, and Relationship Saturation.\nWhile some rules only apply to business leaders, a few also apply to humans who are trying to remain economically relevant – especially the rules around embracing automation and evolving continuously. The alternative is to be made redundant. For example, I recently saw a job ad for a copywriter position that required both the ability to produce excellent copy without the aid of AI, and also the ability to produce excellent copy four times faster with AI tools. The same applies – explicitly or implicitly – to any knowledge work that can now be sped up with automation.\n","wordCount":"453","inLanguage":"en","datePublished":"2024-05-25T00:00:00Z","dateModified":"2024-05-25T10:00:56+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Adapting to the economy of algorithms</h1><div class=post-meta><span title='2024-05-25 00:00:00 +0000 UTC'>May 25, 2024</span></div></header><div class=post-content><p>I recently read <a href=https://www.blackincbooks.com.au/books/economy-algorithms target=_blank rel=noopener>The Economy of Algorithms: AI and the Rise of the Digital Minions, by Marek Kowalkiewicz</a>. It&rsquo;s a light read that is mostly aimed at business leaders who are looking to adapt to our current age of <a href=https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/>increasing algorithmic automation (aka AI)</a>. However, some of the points are relevant to any human – especially to knowledge workers.</p><p>The main message of the book is that we&rsquo;re experiencing an economic transition between three economies:</p><ol><li><strong>Economy of Corporations</strong>: What we had in the 20th century, with corporations being the most powerful entities.</li><li><strong>Economy of People</strong>: Emerged in the early 21st century, with individuals gaining agency that has enabled us to compete with corporations (e.g., with YouTube influencers becoming more popular than some TV channels).</li><li><strong>Economy of Algorithms</strong>: Starting in recent years, algorithms have been gaining agency and competing with people and corporations.</li></ol><p>The emergence of a new economy doesn&rsquo;t immediately obviate the previous economies, as Marek explains:</p><blockquote><p>In the economy of algorithms, corporations can leverage advanced algorithms to optimise their operations, enhance decision-making and drive innovation, which increases their competitiveness and profitability. Individuals can harness the power of algorithms to develop new skills, create innovative products or services and participate in emerging markets that were once inaccessible to them. And, as you&rsquo;ll soon find out, algorithms themselves – without active human or corporate control – can buy, sell, collect and invest funds, and perform other activities that were previously reserved for businesses and people.</p></blockquote><p>The book includes a rich set of examples and stories that illustrate the above points. As everyone is wondering what to do in this period of accelerating change, Marek proposes nine rules for our age. These are grouped under three areas:</p><ol><li><strong>Be the minion master: Automate revenue generation</strong><ul><li>Automate relentlessly but mindfully</li><li>Build an army of digital minions</li><li>Empower your people</li></ul></li><li><strong>Be relentlessly curious: Evolve continuously</strong><ul><li>Launch new value propositions</li><li>Start a digital evolution</li><li>Stay curious</li></ul></li><li><strong>Be boldly optimistic: Saturate relationships with customers</strong><ul><li>Maximise customer value</li><li>Build digital ecosystems</li><li>Create a bold future</li></ul></li></ol><p>Marek calls corporations that do this well <em>RACERS</em>, which stands for Revenue Automation, Continuous Evolution, and Relationship Saturation.</p><p>While some rules only apply to business leaders, a few also apply to <a href=https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/>humans who are trying to remain economically relevant</a> – especially the rules around embracing automation and evolving continuously. The alternative is to be made redundant. For example, I recently saw a job ad for a copywriter position that required both the ability to produce excellent copy without the aid of AI, <em>and also</em> the ability to produce excellent copy four times faster with AI tools. The same applies – explicitly or implicitly – to any knowledge work that can now be sped up with automation.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/futurism/>Futurism</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Adapting to the economy of algorithms on x" href="https://x.com/intent/tweet/?text=Adapting%20to%20the%20economy%20of%20algorithms&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f05%2f25%2fadapting-to-the-economy-of-algorithms%2f&amp;hashtags=artificialintelligence%2cbooks%2cbusiness%2ccareer%2cfuturism%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Adapting to the economy of algorithms on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f05%2f25%2fadapting-to-the-economy-of-algorithms%2f&amp;title=Adapting%20to%20the%20economy%20of%20algorithms&amp;summary=Adapting%20to%20the%20economy%20of%20algorithms&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f05%2f25%2fadapting-to-the-economy-of-algorithms%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Adapting to the economy of algorithms on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f05%2f25%2fadapting-to-the-economy-of-algorithms%2f&title=Adapting%20to%20the%20economy%20of%20algorithms"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Adapting to the economy of algorithms on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f05%2f25%2fadapting-to-the-economy-of-algorithms%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Adapting to the economy of algorithms on whatsapp" href="https://api.whatsapp.com/send?text=Adapting%20to%20the%20economy%20of%20algorithms%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f05%2f25%2fadapting-to-the-economy-of-algorithms%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Adapting to the economy of algorithms on telegram" href="https://telegram.me/share/url?text=Adapting%20to%20the%20economy%20of%20algorithms&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f05%2f25%2fadapting-to-the-economy-of-algorithms%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Adapting to the economy of algorithms on ycombinator" href="https://news.ycombinator.com/submitlink?t=Adapting%20to%20the%20economy%20of%20algorithms&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f05%2f25%2fadapting-to-the-economy-of-algorithms%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2024/06/12/the-rules-of-the-passion-economy/index.html b/til/2024/06/12/the-rules-of-the-passion-economy/index.html
index 896eedd1a..81e76d72f 100644
--- a/til/2024/06/12/the-rules-of-the-passion-economy/index.html
+++ b/til/2024/06/12/the-rules-of-the-passion-economy/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>The rules of the passion economy | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="books,business,career,quotes"><meta name=description content="Summary of the main messages from the book The Passion Economy by Adam Davidson."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The rules of the passion economy"><meta property="og:description" content="Summary of the main messages from the book The Passion Economy by Adam Davidson."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-06-12T02:50:00+00:00"><meta property="article:modified_time" content="2024-06-12T12:58:06+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="The rules of the passion economy"><meta name=twitter:description content="Summary of the main messages from the book The Passion Economy by Adam Davidson."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"The rules of the passion economy","item":"https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The rules of the passion economy","name":"The rules of the passion economy","description":"Summary of the main messages from the book The Passion Economy by Adam Davidson.","keywords":["books","business","career","quotes"],"articleBody":"I recently read The Passion Economy by Adam Davidson. The book has some good stories, but I felt like it stretched the main idea a bit too much towards the end (especially the part that glorified Google).\nThat said, I liked the chapter about the rules of the passion economy, so I’m posting them here for future reference:\nPursue intimacy at scale: Identify what you love and do well, match your passion to those who want it, and listen to customer feedback. Only create value that can’t be easily copied. The price you charge should match the value you provide: Price drive costs, value is a conversation, passion pricing is a service, note your best alternative to a negotiated agreement, charge a lot and then earn it, pay may come in other ways than money, keep changing your prices and offerings, salary is a price, the price you charge should feel good to you, pricing low isn’t a strategy, and pricing is your value. Fewer passionate customers are better than a lot of indifferent ones: Value pricing requires selling to the right people, don’t rush into a niche too quickly, the best customers are those who seek you out (eventually), and passion/pricing/value/customers are all different views of the same thing. Passion is a story: You’re selling a story – it better be true, always tell the truth, you must tell your story – it is told in every detail of the business. Technology should always support your business, not drive it: Do what tech and large industry can’t do, tech-driven scale creates space for businesses built on value and passion, and tech tends toward bigness (so stay small). Know what business you’re in, and it’s probably not what you think: Change your value capture constantly and your value creation slowly. Never be in the commodity business, even if you sell what other people consider a commodity. Much of this aligns with lessons I’ve learned about running a solo business (see references at the top of my post on LinkedIn as a teachable skill). In fact, I learned about the book from an episode of The Business of Authority, a podcast that explored the same ground as the rules across years of inspiring episodes.\n","wordCount":"373","inLanguage":"en","datePublished":"2024-06-12T02:50:00Z","dateModified":"2024-06-12T12:58:06+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">The rules of the passion economy</h1><div class=post-meta><span title='2024-06-12 02:50:00 +0000 UTC'>June 12, 2024</span></div></header><div class=post-content><p>I recently read <a href=https://www.goodreads.com/book/show/45152042-the-passion-economy target=_blank rel=noopener>The Passion Economy by Adam Davidson</a>. The book has some good stories, but I felt like it stretched the main idea a bit too much towards the end (especially the part that glorified Google).</p><p>That said, I liked the chapter about the rules of the passion economy, so I&rsquo;m posting them here for future reference:</p><ol><li><strong>Pursue intimacy at scale:</strong> Identify what you love and do well, match your passion to those who want it, and listen to customer feedback.</li><li><strong>Only create value that can&rsquo;t be easily copied.</strong></li><li><strong>The price you charge should match the value you provide:</strong> Price drive costs, value is a conversation, passion pricing is a service, note your best alternative to a negotiated agreement, charge a lot and then earn it, pay may come in other ways than money, keep changing your prices and offerings, salary is a price, the price you charge should feel good to you, pricing low isn&rsquo;t a strategy, and pricing is your value.</li><li><strong>Fewer passionate customers are better than a lot of indifferent ones:</strong> Value pricing requires selling to the right people, don&rsquo;t rush into a niche too quickly, the best customers are those who seek you out (eventually), and passion/pricing/value/customers are all different views of the same thing.</li><li><strong>Passion is a story:</strong> You&rsquo;re selling a story – it better be true, always tell the truth, you must tell your story – it is told in every detail of the business.</li><li><strong>Technology should always support your business, not drive it:</strong> Do what tech and large industry can&rsquo;t do, tech-driven scale creates space for businesses built on value and passion, and tech tends toward bigness (so stay small).</li><li><strong>Know what business you&rsquo;re in, and it&rsquo;s probably not what you think:</strong> Change your value capture constantly and your value creation slowly.</li><li><strong>Never be in the commodity business, even if you sell what other people consider a commodity.</strong></li></ol><p>Much of this aligns with lessons I&rsquo;ve learned about running a solo business (see references at the top of <a href=https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/>my post on LinkedIn as a teachable skill</a>). In fact, I learned about the book from <a href=https://www.thebusinessofauthority.com/episodes/the-passion-economy-with-adam-davidson-replay target=_blank rel=noopener>an episode of The Business of Authority</a>, a podcast that explored the same ground as the rules across years of inspiring episodes.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The rules of the passion economy on x" href="https://x.com/intent/tweet/?text=The%20rules%20of%20the%20passion%20economy&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f12%2fthe-rules-of-the-passion-economy%2f&amp;hashtags=books%2cbusiness%2ccareer%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rules of the passion economy on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f12%2fthe-rules-of-the-passion-economy%2f&amp;title=The%20rules%20of%20the%20passion%20economy&amp;summary=The%20rules%20of%20the%20passion%20economy&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f12%2fthe-rules-of-the-passion-economy%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rules of the passion economy on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f12%2fthe-rules-of-the-passion-economy%2f&title=The%20rules%20of%20the%20passion%20economy"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rules of the passion economy on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f12%2fthe-rules-of-the-passion-economy%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rules of the passion economy on whatsapp" href="https://api.whatsapp.com/send?text=The%20rules%20of%20the%20passion%20economy%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f12%2fthe-rules-of-the-passion-economy%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rules of the passion economy on telegram" href="https://telegram.me/share/url?text=The%20rules%20of%20the%20passion%20economy&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f12%2fthe-rules-of-the-passion-economy%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rules of the passion economy on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20rules%20of%20the%20passion%20economy&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f12%2fthe-rules-of-the-passion-economy%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="books,business,career,quotes"><meta name=description content="Summary of the main messages from the book The Passion Economy by Adam Davidson."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="The rules of the passion economy"><meta property="og:description" content="Summary of the main messages from the book The Passion Economy by Adam Davidson."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-06-12T02:50:00+00:00"><meta property="article:modified_time" content="2024-06-12T12:58:06+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="The rules of the passion economy"><meta name=twitter:description content="Summary of the main messages from the book The Passion Economy by Adam Davidson."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"The rules of the passion economy","item":"https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"The rules of the passion economy","name":"The rules of the passion economy","description":"Summary of the main messages from the book The Passion Economy by Adam Davidson.","keywords":["books","business","career","quotes"],"articleBody":"I recently read The Passion Economy by Adam Davidson. The book has some good stories, but I felt like it stretched the main idea a bit too much towards the end (especially the part that glorified Google).\nThat said, I liked the chapter about the rules of the passion economy, so I’m posting them here for future reference:\nPursue intimacy at scale: Identify what you love and do well, match your passion to those who want it, and listen to customer feedback. Only create value that can’t be easily copied. The price you charge should match the value you provide: Price drive costs, value is a conversation, passion pricing is a service, note your best alternative to a negotiated agreement, charge a lot and then earn it, pay may come in other ways than money, keep changing your prices and offerings, salary is a price, the price you charge should feel good to you, pricing low isn’t a strategy, and pricing is your value. Fewer passionate customers are better than a lot of indifferent ones: Value pricing requires selling to the right people, don’t rush into a niche too quickly, the best customers are those who seek you out (eventually), and passion/pricing/value/customers are all different views of the same thing. Passion is a story: You’re selling a story – it better be true, always tell the truth, you must tell your story – it is told in every detail of the business. Technology should always support your business, not drive it: Do what tech and large industry can’t do, tech-driven scale creates space for businesses built on value and passion, and tech tends toward bigness (so stay small). Know what business you’re in, and it’s probably not what you think: Change your value capture constantly and your value creation slowly. Never be in the commodity business, even if you sell what other people consider a commodity. Much of this aligns with lessons I’ve learned about running a solo business (see references at the top of my post on LinkedIn as a teachable skill). In fact, I learned about the book from an episode of The Business of Authority, a podcast that explored the same ground as the rules across years of inspiring episodes.\n","wordCount":"373","inLanguage":"en","datePublished":"2024-06-12T02:50:00Z","dateModified":"2024-06-12T12:58:06+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">The rules of the passion economy</h1><div class=post-meta><span title='2024-06-12 02:50:00 +0000 UTC'>June 12, 2024</span></div></header><div class=post-content><p>I recently read <a href=https://www.goodreads.com/book/show/45152042-the-passion-economy target=_blank rel=noopener>The Passion Economy by Adam Davidson</a>. The book has some good stories, but I felt like it stretched the main idea a bit too much towards the end (especially the part that glorified Google).</p><p>That said, I liked the chapter about the rules of the passion economy, so I&rsquo;m posting them here for future reference:</p><ol><li><strong>Pursue intimacy at scale:</strong> Identify what you love and do well, match your passion to those who want it, and listen to customer feedback.</li><li><strong>Only create value that can&rsquo;t be easily copied.</strong></li><li><strong>The price you charge should match the value you provide:</strong> Price drive costs, value is a conversation, passion pricing is a service, note your best alternative to a negotiated agreement, charge a lot and then earn it, pay may come in other ways than money, keep changing your prices and offerings, salary is a price, the price you charge should feel good to you, pricing low isn&rsquo;t a strategy, and pricing is your value.</li><li><strong>Fewer passionate customers are better than a lot of indifferent ones:</strong> Value pricing requires selling to the right people, don&rsquo;t rush into a niche too quickly, the best customers are those who seek you out (eventually), and passion/pricing/value/customers are all different views of the same thing.</li><li><strong>Passion is a story:</strong> You&rsquo;re selling a story – it better be true, always tell the truth, you must tell your story – it is told in every detail of the business.</li><li><strong>Technology should always support your business, not drive it:</strong> Do what tech and large industry can&rsquo;t do, tech-driven scale creates space for businesses built on value and passion, and tech tends toward bigness (so stay small).</li><li><strong>Know what business you&rsquo;re in, and it&rsquo;s probably not what you think:</strong> Change your value capture constantly and your value creation slowly.</li><li><strong>Never be in the commodity business, even if you sell what other people consider a commodity.</strong></li></ol><p>Much of this aligns with lessons I&rsquo;ve learned about running a solo business (see references at the top of <a href=https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/>my post on LinkedIn as a teachable skill</a>). In fact, I learned about the book from <a href=https://www.thebusinessofauthority.com/episodes/the-passion-economy-with-adam-davidson-replay target=_blank rel=noopener>an episode of The Business of Authority</a>, a podcast that explored the same ground as the rules across years of inspiring episodes.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/books/>Books</a></li><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share The rules of the passion economy on x" href="https://x.com/intent/tweet/?text=The%20rules%20of%20the%20passion%20economy&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f12%2fthe-rules-of-the-passion-economy%2f&amp;hashtags=books%2cbusiness%2ccareer%2cquotes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rules of the passion economy on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f12%2fthe-rules-of-the-passion-economy%2f&amp;title=The%20rules%20of%20the%20passion%20economy&amp;summary=The%20rules%20of%20the%20passion%20economy&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f12%2fthe-rules-of-the-passion-economy%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rules of the passion economy on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f12%2fthe-rules-of-the-passion-economy%2f&title=The%20rules%20of%20the%20passion%20economy"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rules of the passion economy on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f12%2fthe-rules-of-the-passion-economy%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rules of the passion economy on whatsapp" href="https://api.whatsapp.com/send?text=The%20rules%20of%20the%20passion%20economy%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f12%2fthe-rules-of-the-passion-economy%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rules of the passion economy on telegram" href="https://telegram.me/share/url?text=The%20rules%20of%20the%20passion%20economy&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f12%2fthe-rules-of-the-passion-economy%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share The rules of the passion economy on ycombinator" href="https://news.ycombinator.com/submitlink?t=The%20rules%20of%20the%20passion%20economy&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f12%2fthe-rules-of-the-passion-economy%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2024/06/22/dealing-with-endless-data-changes/index.html b/til/2024/06/22/dealing-with-endless-data-changes/index.html
index 5dd7b1dca..2f57c9035 100644
--- a/til/2024/06/22/dealing-with-endless-data-changes/index.html
+++ b/til/2024/06/22/dealing-with-endless-data-changes/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Dealing with endless data changes | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="artificial intelligence,data strategy,DevOps,machine learning,quotes,software engineering"><meta name=description content="Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Dealing with endless data changes"><meta property="og:description" content="Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-06-22T22:50:00+00:00"><meta property="article:modified_time" content="2024-06-23T08:52:50+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Dealing with endless data changes"><meta name=twitter:description content="Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Dealing with endless data changes","item":"https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Dealing with endless data changes","name":"Dealing with endless data changes","description":"Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data.","keywords":["artificial intelligence","data strategy","DevOps","machine learning","quotes","software engineering"],"articleBody":"I recently listened to the Super Data Science podcast episode on MLOps: The Job and The Key Tools (with Demetrios Brinkmann). The full episode is worth a listen, but one part that especially resonated with me is on how MLOps is about dealing with the unpredictable changes created by data. This is a key difference between working with machine learning in production and traditional applications that are less data-intensive.\nQuotes with minor edits:\nAnd now I think it’s very common for people to understand, okay, you need some kind of change management solutions when you’re playing around with data, when you’re doing things in ML, you need to make sure that if you’re going to put something into production, you test it in every way possible before it goes out there. Otherwise, you are left with that situation that can wind you up in the headlines. And you don’t want that. Nobody wants to be in the head headlines for bad AI uses.\n[…]\nI also think that MLOps is a subset of DevOps.\n[…]\nIt’s easy to get caught up in is thinking that MLOps is just implementing some tools and then you’re good. But really it’s that organizational level and having the reliability, having the ability to put things into production quickly and roll them back if you need to.\n[…]\nDevOps has a lot of change management. So when you make changes to code, you have a process and it’s very mature process that goes into that, right? You change code and then you have unit tests and you have integration tests, and you have somebody that’s merging the branch and maybe you look over it and so there’s a human in the loop and then you roll it out slowly and so you can do some kind of feature flags so that this new code goes into production and you make sure that it doesn’t take down the whole website or take down the whole app, whatever it is. Then you have other tools that can monitor if that new feature or that piece of code is actually being used by users.\nNone of that exists when it comes to data. Where is all that? So when it comes to data, data’s changing all the time and people are making changes to data all the time, whether it’s way upstream or downstream.\n[…]\nData’s changing continuously, but there’s no integration tests, there’s no unit tests, there’s no type of feature flag on rolling out these data changes. There’s no type of monitoring if the data is actually being used later on or the new data streams, it just can break things. And you see that in a broken ML model that now is making bad predictions or it’s going insanely slow for some reason, or it’s just not hitting the mark. Or you see it in a broken dashboard, you see it at the end product. And so it’s funny to me that, going back to DevOps and that whole idea of change management and having these processes in place so that when you do change something, you can still have the reliability that you are going to be able to push out this change and you don’t have to get a call at 03:00 AM.\nIn short, unlike software, data changes in ways you can only manage, rather than fully control. Organisations should recognise this reality and manage data-intensive applications accordingly by adopting MLOps and DataOps practices, in addition to traditional DevOps.\n","wordCount":"587","inLanguage":"en","datePublished":"2024-06-22T22:50:00Z","dateModified":"2024-06-23T08:52:50+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Dealing with endless data changes</h1><div class=post-meta><span title='2024-06-22 22:50:00 +0000 UTC'>June 22, 2024</span></div></header><div class=post-content><p>I recently listened to <a href=https://www.superdatascience.com/podcast/mlops-the-job-and-the-key-tools-with-demetrios-brinkmann target=_blank rel=noopener>the Super Data Science podcast episode on MLOps: The Job and The Key Tools (with Demetrios Brinkmann)</a>. The full episode is worth a listen, but one part that especially resonated with me is on how MLOps is about dealing with the unpredictable changes created by data. This is a key difference between working with machine learning in production and traditional applications that are less data-intensive.</p><p>Quotes with minor edits:</p><blockquote><p>And now I think it&rsquo;s very common for people to understand, okay, you need some kind of change management solutions when you&rsquo;re playing around with data, when you&rsquo;re doing things in ML, you need to make sure that if you&rsquo;re going to put something into production, you test it in every way possible before it goes out there. Otherwise, you are left with that situation that can wind you up in the headlines. And you don&rsquo;t want that. Nobody wants to be in the head headlines for bad AI uses.</p><p>[&mldr;]</p><p>I also think that MLOps is a subset of DevOps.</p><p>[&mldr;]</p><p>It&rsquo;s easy to get caught up in is thinking that MLOps is just implementing some tools and then you&rsquo;re good. But really it&rsquo;s that organizational level and having the reliability, having the ability to put things into production quickly and roll them back if you need to.</p><p>[&mldr;]</p><p>DevOps has a lot of change management. So when you make changes to code, you have a process and it&rsquo;s very mature process that goes into that, right? You change code and then you have unit tests and you have integration tests, and you have somebody that&rsquo;s merging the branch and maybe you look over it and so there&rsquo;s a human in the loop and then you roll it out slowly and so you can do some kind of feature flags so that this new code goes into production and you make sure that it doesn&rsquo;t take down the whole website or take down the whole app, whatever it is. Then you have other tools that can monitor if that new feature or that piece of code is actually being used by users.</p><p>None of that exists when it comes to data. Where is all that? So when it comes to data, data&rsquo;s changing all the time and people are making changes to data all the time, whether it&rsquo;s way upstream or downstream.</p><p>[&mldr;]</p><p>Data&rsquo;s changing continuously, but there&rsquo;s no integration tests, there&rsquo;s no unit tests, there&rsquo;s no type of feature flag on rolling out these data changes. There&rsquo;s no type of monitoring if the data is actually being used later on or the new data streams, it just can break things. And you see that in a broken ML model that now is making bad predictions or it&rsquo;s going insanely slow for some reason, or it&rsquo;s just not hitting the mark. Or you see it in a broken dashboard, you see it at the end product. And so it&rsquo;s funny to me that, going back to DevOps and that whole idea of change management and having these processes in place so that when you do change something, you can still have the reliability that you are going to be able to push out this change and you don&rsquo;t have to get a call at 03:00 AM.</p></blockquote><p>In short, <strong>unlike software, data changes in ways you can only manage, rather than fully control</strong>. Organisations should recognise this reality and manage data-intensive applications accordingly by adopting MLOps and DataOps practices, in addition to traditional DevOps.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/devops/>DevOps</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Dealing with endless data changes on x" href="https://x.com/intent/tweet/?text=Dealing%20with%20endless%20data%20changes&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f22%2fdealing-with-endless-data-changes%2f&amp;hashtags=artificialintelligence%2cdatastrategy%2cDevOps%2cmachinelearning%2cquotes%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Dealing with endless data changes on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f22%2fdealing-with-endless-data-changes%2f&amp;title=Dealing%20with%20endless%20data%20changes&amp;summary=Dealing%20with%20endless%20data%20changes&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f22%2fdealing-with-endless-data-changes%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Dealing with endless data changes on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f22%2fdealing-with-endless-data-changes%2f&title=Dealing%20with%20endless%20data%20changes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Dealing with endless data changes on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f22%2fdealing-with-endless-data-changes%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Dealing with endless data changes on whatsapp" href="https://api.whatsapp.com/send?text=Dealing%20with%20endless%20data%20changes%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f22%2fdealing-with-endless-data-changes%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Dealing with endless data changes on telegram" href="https://telegram.me/share/url?text=Dealing%20with%20endless%20data%20changes&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f22%2fdealing-with-endless-data-changes%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Dealing with endless data changes on ycombinator" href="https://news.ycombinator.com/submitlink?t=Dealing%20with%20endless%20data%20changes&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f22%2fdealing-with-endless-data-changes%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="artificial intelligence,data strategy,DevOps,machine learning,quotes,software engineering"><meta name=description content="Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Dealing with endless data changes"><meta property="og:description" content="Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-06-22T22:50:00+00:00"><meta property="article:modified_time" content="2024-06-23T08:52:50+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Dealing with endless data changes"><meta name=twitter:description content="Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Dealing with endless data changes","item":"https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Dealing with endless data changes","name":"Dealing with endless data changes","description":"Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data.","keywords":["artificial intelligence","data strategy","DevOps","machine learning","quotes","software engineering"],"articleBody":"I recently listened to the Super Data Science podcast episode on MLOps: The Job and The Key Tools (with Demetrios Brinkmann). The full episode is worth a listen, but one part that especially resonated with me is on how MLOps is about dealing with the unpredictable changes created by data. This is a key difference between working with machine learning in production and traditional applications that are less data-intensive.\nQuotes with minor edits:\nAnd now I think it’s very common for people to understand, okay, you need some kind of change management solutions when you’re playing around with data, when you’re doing things in ML, you need to make sure that if you’re going to put something into production, you test it in every way possible before it goes out there. Otherwise, you are left with that situation that can wind you up in the headlines. And you don’t want that. Nobody wants to be in the head headlines for bad AI uses.\n[…]\nI also think that MLOps is a subset of DevOps.\n[…]\nIt’s easy to get caught up in is thinking that MLOps is just implementing some tools and then you’re good. But really it’s that organizational level and having the reliability, having the ability to put things into production quickly and roll them back if you need to.\n[…]\nDevOps has a lot of change management. So when you make changes to code, you have a process and it’s very mature process that goes into that, right? You change code and then you have unit tests and you have integration tests, and you have somebody that’s merging the branch and maybe you look over it and so there’s a human in the loop and then you roll it out slowly and so you can do some kind of feature flags so that this new code goes into production and you make sure that it doesn’t take down the whole website or take down the whole app, whatever it is. Then you have other tools that can monitor if that new feature or that piece of code is actually being used by users.\nNone of that exists when it comes to data. Where is all that? So when it comes to data, data’s changing all the time and people are making changes to data all the time, whether it’s way upstream or downstream.\n[…]\nData’s changing continuously, but there’s no integration tests, there’s no unit tests, there’s no type of feature flag on rolling out these data changes. There’s no type of monitoring if the data is actually being used later on or the new data streams, it just can break things. And you see that in a broken ML model that now is making bad predictions or it’s going insanely slow for some reason, or it’s just not hitting the mark. Or you see it in a broken dashboard, you see it at the end product. And so it’s funny to me that, going back to DevOps and that whole idea of change management and having these processes in place so that when you do change something, you can still have the reliability that you are going to be able to push out this change and you don’t have to get a call at 03:00 AM.\nIn short, unlike software, data changes in ways you can only manage, rather than fully control. Organisations should recognise this reality and manage data-intensive applications accordingly by adopting MLOps and DataOps practices, in addition to traditional DevOps.\n","wordCount":"587","inLanguage":"en","datePublished":"2024-06-22T22:50:00Z","dateModified":"2024-06-23T08:52:50+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Dealing with endless data changes</h1><div class=post-meta><span title='2024-06-22 22:50:00 +0000 UTC'>June 22, 2024</span></div></header><div class=post-content><p>I recently listened to <a href=https://www.superdatascience.com/podcast/mlops-the-job-and-the-key-tools-with-demetrios-brinkmann target=_blank rel=noopener>the Super Data Science podcast episode on MLOps: The Job and The Key Tools (with Demetrios Brinkmann)</a>. The full episode is worth a listen, but one part that especially resonated with me is on how MLOps is about dealing with the unpredictable changes created by data. This is a key difference between working with machine learning in production and traditional applications that are less data-intensive.</p><p>Quotes with minor edits:</p><blockquote><p>And now I think it&rsquo;s very common for people to understand, okay, you need some kind of change management solutions when you&rsquo;re playing around with data, when you&rsquo;re doing things in ML, you need to make sure that if you&rsquo;re going to put something into production, you test it in every way possible before it goes out there. Otherwise, you are left with that situation that can wind you up in the headlines. And you don&rsquo;t want that. Nobody wants to be in the head headlines for bad AI uses.</p><p>[&mldr;]</p><p>I also think that MLOps is a subset of DevOps.</p><p>[&mldr;]</p><p>It&rsquo;s easy to get caught up in is thinking that MLOps is just implementing some tools and then you&rsquo;re good. But really it&rsquo;s that organizational level and having the reliability, having the ability to put things into production quickly and roll them back if you need to.</p><p>[&mldr;]</p><p>DevOps has a lot of change management. So when you make changes to code, you have a process and it&rsquo;s very mature process that goes into that, right? You change code and then you have unit tests and you have integration tests, and you have somebody that&rsquo;s merging the branch and maybe you look over it and so there&rsquo;s a human in the loop and then you roll it out slowly and so you can do some kind of feature flags so that this new code goes into production and you make sure that it doesn&rsquo;t take down the whole website or take down the whole app, whatever it is. Then you have other tools that can monitor if that new feature or that piece of code is actually being used by users.</p><p>None of that exists when it comes to data. Where is all that? So when it comes to data, data&rsquo;s changing all the time and people are making changes to data all the time, whether it&rsquo;s way upstream or downstream.</p><p>[&mldr;]</p><p>Data&rsquo;s changing continuously, but there&rsquo;s no integration tests, there&rsquo;s no unit tests, there&rsquo;s no type of feature flag on rolling out these data changes. There&rsquo;s no type of monitoring if the data is actually being used later on or the new data streams, it just can break things. And you see that in a broken ML model that now is making bad predictions or it&rsquo;s going insanely slow for some reason, or it&rsquo;s just not hitting the mark. Or you see it in a broken dashboard, you see it at the end product. And so it&rsquo;s funny to me that, going back to DevOps and that whole idea of change management and having these processes in place so that when you do change something, you can still have the reliability that you are going to be able to push out this change and you don&rsquo;t have to get a call at 03:00 AM.</p></blockquote><p>In short, <strong>unlike software, data changes in ways you can only manage, rather than fully control</strong>. Organisations should recognise this reality and manage data-intensive applications accordingly by adopting MLOps and DataOps practices, in addition to traditional DevOps.</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/artificial-intelligence/>Artificial Intelligence</a></li><li><a href=https://yanirseroussi.com/tags/data-strategy/>Data Strategy</a></li><li><a href=https://yanirseroussi.com/tags/devops/>DevOps</a></li><li><a href=https://yanirseroussi.com/tags/machine-learning/>Machine Learning</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li><li><a href=https://yanirseroussi.com/tags/software-engineering/>Software Engineering</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Dealing with endless data changes on x" href="https://x.com/intent/tweet/?text=Dealing%20with%20endless%20data%20changes&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f22%2fdealing-with-endless-data-changes%2f&amp;hashtags=artificialintelligence%2cdatastrategy%2cDevOps%2cmachinelearning%2cquotes%2csoftwareengineering"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Dealing with endless data changes on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f22%2fdealing-with-endless-data-changes%2f&amp;title=Dealing%20with%20endless%20data%20changes&amp;summary=Dealing%20with%20endless%20data%20changes&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f22%2fdealing-with-endless-data-changes%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Dealing with endless data changes on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f22%2fdealing-with-endless-data-changes%2f&title=Dealing%20with%20endless%20data%20changes"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Dealing with endless data changes on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f22%2fdealing-with-endless-data-changes%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Dealing with endless data changes on whatsapp" href="https://api.whatsapp.com/send?text=Dealing%20with%20endless%20data%20changes%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f22%2fdealing-with-endless-data-changes%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Dealing with endless data changes on telegram" href="https://telegram.me/share/url?text=Dealing%20with%20endless%20data%20changes&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f22%2fdealing-with-endless-data-changes%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Dealing with endless data changes on ycombinator" href="https://news.ycombinator.com/submitlink?t=Dealing%20with%20endless%20data%20changes&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f22%2fdealing-with-endless-data-changes%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/index.html b/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/index.html
index f0aaca074..1144a5d33 100644
--- a/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/index.html
+++ b/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/index.html
@@ -1,5 +1,5 @@
 <!doctype html><html lang=en dir=auto><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Five team-building mistakes, according to Patty McCord | Yanir Seroussi | Data & AI for Startup Impact</title>
-<meta name=keywords content="business,career,quotes,startups"><meta name=description content="Takeaways from an interview with Patty McCord on The Startup Podcast."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Five team-building mistakes, according to Patty McCord"><meta property="og:description" content="Takeaways from an interview with Patty McCord on The Startup Podcast."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-06-26T00:00:00+00:00"><meta property="article:modified_time" content="2024-06-26T10:45:15+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Five team-building mistakes, according to Patty McCord"><meta name=twitter:description content="Takeaways from an interview with Patty McCord on The Startup Podcast."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Five team-building mistakes, according to Patty McCord","item":"https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Five team-building mistakes, according to Patty McCord","name":"Five team-building mistakes, according to Patty McCord","description":"Takeaways from an interview with Patty McCord on The Startup Podcast.","keywords":["business","career","quotes","startups"],"articleBody":"I’ve heard about the legendary Patty McCord and her work as Chief Talent Officer at Netflix, but have never looked deeply into it.\nRecently, I listened to an interview with her on The Startup Podcast, titled Five Biggest Team Building Mistakes.\nThey all resonated:\nTolerating mediocre performance Not sharing business context with everybody on the team Creating a command and control culture Avoiding difficult conversations Practising HR theater Key quote (emphasis mine):\nI’m supposed to be this big innovator in my field, right? Oh, Patty, she’s the guru of HR. She’s reinvented… [but] I didn’t do anything radical. Here’s what I did that was radical.\nI stopped doing stupid stuff that doesn’t matter. Just stopped.\nWhy do we do this thing that everybody on God’s Earth does? Does it make any sense? Does it move the business forward? Does it make a customer happy? Does it increase our profit margins? And does it allow people to do their best work?\nAnd if I’m asking somebody who I’m paying quarter of a million dollars a year to go ask somebody in finance for permission to spend $5,000 when all the person in finance adds to the equation is, you know, oh, well, that’s $5,012, you can’t do that. They’re gonna spend their brain power figuring out how to work around somebody who doesn’t know what they’re doing in the finance organization rather than solving the problem we need to solve. I mean, it’s just like, what an utter waste of time for everybody.\nIf we all stopped doing stupid stuff that doesn’t matter, the world would definitely be a better place.\nGo check out the full episode, it’s gold!\n","wordCount":"278","inLanguage":"en","datePublished":"2024-06-26T00:00:00Z","dateModified":"2024-06-26T10:45:15+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Five team-building mistakes, according to Patty McCord</h1><div class=post-meta><span title='2024-06-26 00:00:00 +0000 UTC'>June 26, 2024</span></div></header><div class=post-content><p>I&rsquo;ve heard about the legendary <a href=https://www.linkedin.com/in/pattymccord/ target=_blank rel=noopener>Patty McCord</a> and her work as Chief Talent Officer at Netflix, but have never looked deeply into it.</p><p>Recently, I listened to an interview with her on The Startup Podcast, titled <a href=https://www.tsp.show/5-biggest-team-building-mistakes-w-netflix-legend-patty-mccord/ target=_blank rel=noopener>Five Biggest Team Building Mistakes</a>.</p><p>They all resonated:</p><blockquote><ol><li>Tolerating mediocre performance</li><li>Not sharing business context with everybody on the team</li><li>Creating a command and control culture</li><li>Avoiding difficult conversations</li><li>Practising HR theater</li></ol></blockquote><p>Key quote (emphasis mine):</p><blockquote><p>I&rsquo;m supposed to be this big innovator in my field, right? Oh, Patty, she&rsquo;s the guru of HR. She&rsquo;s reinvented&mldr; [but] I didn&rsquo;t do anything radical. Here&rsquo;s what I did that was radical.</p><p><strong>I stopped doing stupid stuff that doesn&rsquo;t matter. Just stopped.</strong></p><p>Why do we do this thing that everybody on God&rsquo;s Earth does? Does it make any sense? Does it move the business forward? Does it make a customer happy? Does it increase our profit margins? And does it allow people to do their best work?</p><p>And if I&rsquo;m asking somebody who I&rsquo;m paying quarter of a million dollars a year to go ask somebody in finance for permission to spend $5,000 when all the person in finance adds to the equation is, you know, oh, well, that&rsquo;s $5,012, you can&rsquo;t do that. They&rsquo;re gonna spend their brain power figuring out how to work around somebody who doesn&rsquo;t know what they&rsquo;re doing in the finance organization rather than solving the problem we need to solve. I mean, it&rsquo;s just like, what an utter waste of time for everybody.</p></blockquote><p>If we all stopped doing stupid stuff that doesn&rsquo;t matter, the world would definitely be a better place.</p><p>Go check out <a href=https://www.tsp.show/5-biggest-team-building-mistakes-w-netflix-legend-patty-mccord/ target=_blank rel=noopener>the full episode</a>, it&rsquo;s gold!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Five team-building mistakes, according to Patty McCord on x" href="https://x.com/intent/tweet/?text=Five%20team-building%20mistakes%2c%20according%20to%20Patty%20McCord&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f26%2ffive-team-building-mistakes-according-to-patty-mccord%2f&amp;hashtags=business%2ccareer%2cquotes%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Five team-building mistakes, according to Patty McCord on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f26%2ffive-team-building-mistakes-according-to-patty-mccord%2f&amp;title=Five%20team-building%20mistakes%2c%20according%20to%20Patty%20McCord&amp;summary=Five%20team-building%20mistakes%2c%20according%20to%20Patty%20McCord&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f26%2ffive-team-building-mistakes-according-to-patty-mccord%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Five team-building mistakes, according to Patty McCord on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f26%2ffive-team-building-mistakes-according-to-patty-mccord%2f&title=Five%20team-building%20mistakes%2c%20according%20to%20Patty%20McCord"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Five team-building mistakes, according to Patty McCord on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f26%2ffive-team-building-mistakes-according-to-patty-mccord%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Five team-building mistakes, according to Patty McCord on whatsapp" href="https://api.whatsapp.com/send?text=Five%20team-building%20mistakes%2c%20according%20to%20Patty%20McCord%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f26%2ffive-team-building-mistakes-according-to-patty-mccord%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Five team-building mistakes, according to Patty McCord on telegram" href="https://telegram.me/share/url?text=Five%20team-building%20mistakes%2c%20according%20to%20Patty%20McCord&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f26%2ffive-team-building-mistakes-according-to-patty-mccord%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Five team-building mistakes, according to Patty McCord on ycombinator" href="https://news.ycombinator.com/submitlink?t=Five%20team-building%20mistakes%2c%20according%20to%20Patty%20McCord&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f26%2ffive-team-building-mistakes-according-to-patty-mccord%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
+<meta name=keywords content="business,career,quotes,startups"><meta name=description content="Takeaways from an interview with Patty McCord on The Startup Podcast."><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="Five team-building mistakes, according to Patty McCord"><meta property="og:description" content="Takeaways from an interview with Patty McCord on The Startup Podcast."><meta property="og:type" content="article"><meta property="og:url" content="https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/"><meta property="article:section" content="til"><meta property="article:published_time" content="2024-06-26T00:00:00+00:00"><meta property="article:modified_time" content="2024-06-26T10:45:15+10:00"><meta name=twitter:card content="summary"><meta name=twitter:title content="Five team-building mistakes, according to Patty McCord"><meta name=twitter:description content="Takeaways from an interview with Patty McCord on The Startup Podcast."><script type=application/ld+json>{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"TIL: Today I learned...","item":"https://yanirseroussi.com/til/"},{"@type":"ListItem","position":2,"name":"Five team-building mistakes, according to Patty McCord","item":"https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/"}]}</script><script type=application/ld+json>{"@context":"https://schema.org","@type":"BlogPosting","headline":"Five team-building mistakes, according to Patty McCord","name":"Five team-building mistakes, according to Patty McCord","description":"Takeaways from an interview with Patty McCord on The Startup Podcast.","keywords":["business","career","quotes","startups"],"articleBody":"I’ve heard about the legendary Patty McCord and her work as Chief Talent Officer at Netflix, but have never looked deeply into it.\nRecently, I listened to an interview with her on The Startup Podcast, titled Five Biggest Team Building Mistakes.\nThey all resonated:\nTolerating mediocre performance Not sharing business context with everybody on the team Creating a command and control culture Avoiding difficult conversations Practising HR theater Key quote (emphasis mine):\nI’m supposed to be this big innovator in my field, right? Oh, Patty, she’s the guru of HR. She’s reinvented… [but] I didn’t do anything radical. Here’s what I did that was radical.\nI stopped doing stupid stuff that doesn’t matter. Just stopped.\nWhy do we do this thing that everybody on God’s Earth does? Does it make any sense? Does it move the business forward? Does it make a customer happy? Does it increase our profit margins? And does it allow people to do their best work?\nAnd if I’m asking somebody who I’m paying quarter of a million dollars a year to go ask somebody in finance for permission to spend $5,000 when all the person in finance adds to the equation is, you know, oh, well, that’s $5,012, you can’t do that. They’re gonna spend their brain power figuring out how to work around somebody who doesn’t know what they’re doing in the finance organization rather than solving the problem we need to solve. I mean, it’s just like, what an utter waste of time for everybody.\nIf we all stopped doing stupid stuff that doesn’t matter, the world would definitely be a better place.\nGo check out the full episode, it’s gold!\n","wordCount":"278","inLanguage":"en","datePublished":"2024-06-26T00:00:00Z","dateModified":"2024-06-26T10:45:15+10:00","author":{"@type":"Person","name":"Yanir Seroussi"},"mainEntityOfPage":{"@type":"WebPage","@id":"https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/"},"publisher":{"@type":"Organization","name":"Yanir Seroussi | Data \u0026 AI for Startup Impact","logo":{"@type":"ImageObject","url":"https://yanirseroussi.com/favicon.ico"}}}</script></head><body id=top><script>localStorage.getItem("pref-theme")==="dark"?document.body.classList.add("dark"):localStorage.getItem("pref-theme")==="light"?document.body.classList.remove("dark"):window.matchMedia("(prefers-color-scheme: dark)").matches&&document.body.classList.add("dark")</script><header class=header><nav class=nav><div class=logo><a href=https://yanirseroussi.com/ accesskey=h title="Yanir Seroussi | Data & AI for Startup Impact (Alt + H)">Yanir Seroussi | Data & AI for Startup Impact</a><div class=logo-switches><button id=theme-toggle accesskey=t title="(Alt + T)"><svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1111.21 3 7 7 0 0021 12.79z"/></svg><svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="5"/><line x1="12" y1="1" x2="12" y2="3"/><line x1="12" y1="21" x2="12" y2="23"/><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"/><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"/><line x1="1" y1="12" x2="3" y2="12"/><line x1="21" y1="12" x2="23" y2="12"/><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"/><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"/></svg></button></div></div><button id=menu-trigger aria-haspopup=menu aria-label="Menu Button"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather feather-menu"><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="18" x2="21" y2="18"/></svg></button><ul class="menu hidden"><li><a href=https://yanirseroussi.com/about/ title=About><span>About</span></a></li><li><a href=https://yanirseroussi.com/posts/ title=Posts><span>Posts</span></a></li><li><a href=https://yanirseroussi.com/talks/ title=Talks><span>Talks</span></a></li><li><a href=https://yanirseroussi.com/consult/ title=Consulting><span>Consulting</span></a></li></ul></nav></header><main class=main><article class=post-single><header class=post-header><div class=breadcrumbs><a href=https://yanirseroussi.com/>Home</a>&nbsp;»&nbsp;<a href=https://yanirseroussi.com/til/>TIL: Today I learned...</a></div><h1 class="post-title entry-hint-parent">Five team-building mistakes, according to Patty McCord</h1><div class=post-meta><span title='2024-06-26 00:00:00 +0000 UTC'>June 26, 2024</span></div></header><div class=post-content><p>I&rsquo;ve heard about the legendary <a href=https://www.linkedin.com/in/pattymccord/ target=_blank rel=noopener>Patty McCord</a> and her work as Chief Talent Officer at Netflix, but have never looked deeply into it.</p><p>Recently, I listened to an interview with her on The Startup Podcast, titled <a href=https://www.tsp.show/5-biggest-team-building-mistakes-w-netflix-legend-patty-mccord/ target=_blank rel=noopener>Five Biggest Team Building Mistakes</a>.</p><p>They all resonated:</p><blockquote><ol><li>Tolerating mediocre performance</li><li>Not sharing business context with everybody on the team</li><li>Creating a command and control culture</li><li>Avoiding difficult conversations</li><li>Practising HR theater</li></ol></blockquote><p>Key quote (emphasis mine):</p><blockquote><p>I&rsquo;m supposed to be this big innovator in my field, right? Oh, Patty, she&rsquo;s the guru of HR. She&rsquo;s reinvented&mldr; [but] I didn&rsquo;t do anything radical. Here&rsquo;s what I did that was radical.</p><p><strong>I stopped doing stupid stuff that doesn&rsquo;t matter. Just stopped.</strong></p><p>Why do we do this thing that everybody on God&rsquo;s Earth does? Does it make any sense? Does it move the business forward? Does it make a customer happy? Does it increase our profit margins? And does it allow people to do their best work?</p><p>And if I&rsquo;m asking somebody who I&rsquo;m paying quarter of a million dollars a year to go ask somebody in finance for permission to spend $5,000 when all the person in finance adds to the equation is, you know, oh, well, that&rsquo;s $5,012, you can&rsquo;t do that. They&rsquo;re gonna spend their brain power figuring out how to work around somebody who doesn&rsquo;t know what they&rsquo;re doing in the finance organization rather than solving the problem we need to solve. I mean, it&rsquo;s just like, what an utter waste of time for everybody.</p></blockquote><p>If we all stopped doing stupid stuff that doesn&rsquo;t matter, the world would definitely be a better place.</p><p>Go check out <a href=https://www.tsp.show/5-biggest-team-building-mistakes-w-netflix-legend-patty-mccord/ target=_blank rel=noopener>the full episode</a>, it&rsquo;s gold!</p></div><footer class=post-footer><ul class=post-tags><li><a href=https://yanirseroussi.com/tags/business/>Business</a></li><li><a href=https://yanirseroussi.com/tags/career/>Career</a></li><li><a href=https://yanirseroussi.com/tags/quotes/>Quotes</a></li><li><a href=https://yanirseroussi.com/tags/startups/>Startups</a></li></ul><ul class=share-buttons><li><a target=_blank rel="noopener noreferrer" aria-label="share Five team-building mistakes, according to Patty McCord on x" href="https://x.com/intent/tweet/?text=Five%20team-building%20mistakes%2c%20according%20to%20Patty%20McCord&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f26%2ffive-team-building-mistakes-according-to-patty-mccord%2f&amp;hashtags=business%2ccareer%2cquotes%2cstartups"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446C483.971.0 512 28.03 512 62.554zM269.951 190.75 182.567 75.216H56L207.216 272.95 63.9 436.783h61.366L235.9 310.383l96.667 126.4H456L298.367 228.367l134-153.151H371.033zM127.633 110h36.468l219.38 290.065H349.5z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Five team-building mistakes, according to Patty McCord on linkedin" href="https://www.linkedin.com/shareArticle?mini=true&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f26%2ffive-team-building-mistakes-according-to-patty-mccord%2f&amp;title=Five%20team-building%20mistakes%2c%20according%20to%20Patty%20McCord&amp;summary=Five%20team-building%20mistakes%2c%20according%20to%20Patty%20McCord&amp;source=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f26%2ffive-team-building-mistakes-according-to-patty-mccord%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM160.461 423.278V197.561h-75.04v225.717h75.04zm270.539.0V293.839c0-69.333-37.018-101.586-86.381-101.586-39.804.0-57.634 21.891-67.617 37.266v-31.958h-75.021c.995 21.181.0 225.717.0 225.717h75.02V297.222c0-6.748.486-13.492 2.474-18.315 5.414-13.475 17.767-27.434 38.494-27.434 27.135.0 38.007 20.707 38.007 51.037v120.768H431zM123.448 88.722C97.774 88.722 81 105.601 81 127.724c0 21.658 16.264 39.002 41.455 39.002h.484c26.165.0 42.452-17.344 42.452-39.002-.485-22.092-16.241-38.954-41.943-39.002z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Five team-building mistakes, according to Patty McCord on reddit" href="https://reddit.com/submit?url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f26%2ffive-team-building-mistakes-according-to-patty-mccord%2f&title=Five%20team-building%20mistakes%2c%20according%20to%20Patty%20McCord"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zM446 265.638c0-22.964-18.616-41.58-41.58-41.58-11.211.0-21.361 4.457-28.841 11.666-28.424-20.508-67.586-33.757-111.204-35.278l18.941-89.121 61.884 13.157c.756 15.734 13.642 28.29 29.56 28.29 16.407.0 29.706-13.299 29.706-29.701.0-16.403-13.299-29.702-29.706-29.702-11.666.0-21.657 6.792-26.515 16.578l-69.105-14.69c-1.922-.418-3.939-.042-5.585 1.036-1.658 1.073-2.811 2.761-3.224 4.686l-21.152 99.438c-44.258 1.228-84.046 14.494-112.837 35.232-7.468-7.164-17.589-11.591-28.757-11.591-22.965.0-41.585 18.616-41.585 41.58.0 16.896 10.095 31.41 24.568 37.918-.639 4.135-.99 8.328-.99 12.576.0 63.977 74.469 115.836 166.33 115.836s166.334-51.859 166.334-115.836c0-4.218-.347-8.387-.977-12.493 14.564-6.47 24.735-21.034 24.735-38.001zM326.526 373.831c-20.27 20.241-59.115 21.816-70.534 21.816-11.428.0-50.277-1.575-70.522-21.82-3.007-3.008-3.007-7.882.0-10.889 3.003-2.999 7.882-3.003 10.885.0 12.777 12.781 40.11 17.317 59.637 17.317 19.522.0 46.86-4.536 59.657-17.321 3.016-2.999 7.886-2.995 10.885.008 3.008 3.011 3.003 7.882-.008 10.889zm-5.23-48.781c-16.373.0-29.701-13.324-29.701-29.698.0-16.381 13.328-29.714 29.701-29.714 16.378.0 29.706 13.333 29.706 29.714.0 16.374-13.328 29.698-29.706 29.698zM160.91 295.348c0-16.381 13.328-29.71 29.714-29.71 16.369.0 29.689 13.329 29.689 29.71.0 16.373-13.32 29.693-29.689 29.693-16.386.0-29.714-13.32-29.714-29.693z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Five team-building mistakes, according to Patty McCord on facebook" href="https://facebook.com/sharer/sharer.php?u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f26%2ffive-team-building-mistakes-according-to-patty-mccord%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H342.978V319.085h66.6l12.672-82.621h-79.272v-53.617c0-22.603 11.073-44.636 46.58-44.636H425.6v-70.34s-32.71-5.582-63.982-5.582c-65.288.0-107.96 39.569-107.96 111.204v62.971h-72.573v82.621h72.573V512h-191.104c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Five team-building mistakes, according to Patty McCord on whatsapp" href="https://api.whatsapp.com/send?text=Five%20team-building%20mistakes%2c%20according%20to%20Patty%20McCord%20-%20https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f26%2ffive-team-building-mistakes-according-to-patty-mccord%2f"><svg viewBox="0 0 512 512" height="30" width="30" fill="currentcolor"><path d="M449.446.0C483.971.0 512 28.03 512 62.554v386.892C512 483.97 483.97 512 449.446 512H62.554c-34.524.0-62.554-28.03-62.554-62.554V62.554c0-34.524 28.029-62.554 62.554-62.554h386.892zm-58.673 127.703c-33.842-33.881-78.847-52.548-126.798-52.568-98.799.0-179.21 80.405-179.249 179.234-.013 31.593 8.241 62.428 23.927 89.612l-25.429 92.884 95.021-24.925c26.181 14.28 55.659 21.807 85.658 21.816h.074c98.789.0 179.206-80.413 179.247-179.243.018-47.895-18.61-92.93-52.451-126.81zM263.976 403.485h-.06c-26.734-.01-52.954-7.193-75.828-20.767l-5.441-3.229-56.386 14.792 15.05-54.977-3.542-5.637c-14.913-23.72-22.791-51.136-22.779-79.287.033-82.142 66.867-148.971 149.046-148.971 39.793.014 77.199 15.531 105.329 43.692 28.128 28.16 43.609 65.592 43.594 105.4-.034 82.149-66.866 148.983-148.983 148.984zm81.721-111.581c-4.479-2.242-26.499-13.075-30.604-14.571-4.105-1.495-7.091-2.241-10.077 2.241-2.986 4.483-11.569 14.572-14.182 17.562-2.612 2.988-5.225 3.364-9.703 1.12-4.479-2.241-18.91-6.97-36.017-22.23C231.8 264.15 222.81 249.484 220.198 245s-.279-6.908 1.963-9.14c2.016-2.007 4.48-5.232 6.719-7.847 2.24-2.615 2.986-4.484 4.479-7.472 1.493-2.99.747-5.604-.374-7.846-1.119-2.241-10.077-24.288-13.809-33.256-3.635-8.733-7.327-7.55-10.077-7.688-2.609-.13-5.598-.158-8.583-.158-2.986.0-7.839 1.121-11.944 5.604-4.105 4.484-15.675 15.32-15.675 37.364.0 22.046 16.048 43.342 18.287 46.332 2.24 2.99 31.582 48.227 76.511 67.627 10.685 4.615 19.028 7.371 25.533 9.434 10.728 3.41 20.492 2.929 28.209 1.775 8.605-1.285 26.499-10.833 30.231-21.295 3.732-10.464 3.732-19.431 2.612-21.298-1.119-1.869-4.105-2.99-8.583-5.232z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Five team-building mistakes, according to Patty McCord on telegram" href="https://telegram.me/share/url?text=Five%20team-building%20mistakes%2c%20according%20to%20Patty%20McCord&amp;url=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f26%2ffive-team-building-mistakes-according-to-patty-mccord%2f"><svg viewBox="2 2 28 28" height="30" width="30" fill="currentcolor"><path d="M26.49 29.86H5.5a3.37 3.37.0 01-2.47-1 3.35 3.35.0 01-1-2.47V5.48A3.36 3.36.0 013 3 3.37 3.37.0 015.5 2h21A3.38 3.38.0 0129 3a3.36 3.36.0 011 2.46V26.37a3.35 3.35.0 01-1 2.47 3.38 3.38.0 01-2.51 1.02zm-5.38-6.71a.79.79.0 00.85-.66L24.73 9.24a.55.55.0 00-.18-.46.62.62.0 00-.41-.17q-.08.0-16.53 6.11a.59.59.0 00-.41.59.57.57.0 00.43.52l4 1.24 1.61 4.83a.62.62.0 00.63.43.56.56.0 00.4-.17L16.54 20l4.09 3A.9.9.0 0021.11 23.15zM13.8 20.71l-1.21-4q8.72-5.55 8.78-5.55c.15.0.23.0.23.16a.18.18.0 010 .06s-2.51 2.3-7.52 6.8z"/></svg></a></li><li><a target=_blank rel="noopener noreferrer" aria-label="share Five team-building mistakes, according to Patty McCord on ycombinator" href="https://news.ycombinator.com/submitlink?t=Five%20team-building%20mistakes%2c%20according%20to%20Patty%20McCord&u=https%3a%2f%2fyanirseroussi.com%2ftil%2f2024%2f06%2f26%2ffive-team-building-mistakes-according-to-patty-mccord%2f"><svg width="30" height="30" viewBox="0 0 512 512" fill="currentcolor" xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"><path d="M449.446.0C483.971.0 512 28.03 512 62.554V449.446C512 483.97 483.97 512 449.446 512H62.554C28.03 512 0 483.97.0 449.446V62.554C0 28.03 28.029.0 62.554.0H449.446zM183.8767 87.9921h-62.034L230.6673 292.4508V424.0079h50.6655V292.4508L390.1575 87.9921H328.1233L256 238.2489z"/></svg></a></li></ul></footer><a href=/contact/#mailing-list-email target=_blank aria-label="subscribe to mailing list" class=mailing-list-link id=mailing-list-link>Subscribe
 </a><script>const mailingListButton=document.getElementById("mailing-list-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mailingListButton.style.visibility="visible",mailingListButton.style.opacity="1"):(mailingListButton.style.visibility="hidden",mailingListButton.style.opacity="0")}</script><div class=mailing-list-container><script src=https://f.convertkit.com/ckjs/ck.5.js></script><form class="mailing-list seva-form formkit-form" action=https://app.convertkit.com/forms/6549537/subscriptions method=post data-sv-form=6549537 data-uid=9157759fce data-format=inline data-version=5 data-options='{"settings":{"after_subscribe":{"action":"message","redirect_url":"","success_message":"Success! Now check your email to confirm your subscription."},"recaptcha":{"enabled":false},"return_visitor":{"action":"show","custom_content":""}},"version":"5"}'><div data-style=clean><ul class="formkit-alert formkit-alert-error" data-element=errors data-group=alert></ul><div data-element=fields data-stacked=false><label for=mailing-list-email>Get weekly posts in your mailbox</label>
 <input id=mailing-list-email name=email_address aria-label="Email address" placeholder="Email address" required type=email>
 <button data-element=submit>Subscribe</button></div></div></form><div class=footer>Join hundreds of subscribers. No spam or AI-generated slop. Unsubscribe any time.</div></div><section class=comment-section><p class="post-content contact-cta">Public comments are closed, but I love hearing from readers. Feel free to
diff --git a/til/index.html b/til/index.html
index 522d80d8d..3f13377df 100644
--- a/til/index.html
+++ b/til/index.html
@@ -2,7 +2,7 @@
 <meta name=keywords content><meta name=description content="Short, rough posts about things I learned, inspired by [Simon Willison's TIL](https://til.simonwillison.net/).
 Subscribe to the mailing list to receive a digest of recent TILs whenever I publish [longer-form posts](/posts/).
 {{<subscribe_form>}}
-"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.0139a50c6e3f53193500e07972ba88238beaf3384629640b34fa4fd38dc956f6.css integrity="sha256-ATmlDG4/Uxk1AOB5crqII4vq8zhGKWQLNPpP043JVvY=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/til/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="TIL: Today I learned..."><meta property="og:description" content="Short, rough posts about things I learned, inspired by [Simon Willison's TIL](https://til.simonwillison.net/).
+"><meta name=author content="Yanir Seroussi"><link rel=canonical href=https://yanirseroussi.com/til/><meta name=google-site-verification content="aWlue7NGcj4dQpjOKJF7YKiAvw3JuHnq6aFqX6VwWAU"><link crossorigin=anonymous href=/assets/css/stylesheet.5195f4cf408265518a38dd5fab0160ce47b3d775f11dbffe54b35bc963c125d2.css integrity="sha256-UZX0z0CCZVGKON1fqwFgzkez13XxHb/+VLNbyWPBJdI=" rel="preload stylesheet" as=style><link rel=icon href=https://yanirseroussi.com/favicon.ico><link rel=icon type=image/png sizes=16x16 href=https://yanirseroussi.com/favicon-16x16.png><link rel=icon type=image/png sizes=32x32 href=https://yanirseroussi.com/favicon-32x32.png><link rel=apple-touch-icon href=https://yanirseroussi.com/apple-touch-icon.png><link rel=mask-icon href=https://yanirseroussi.com/safari-pinned-tab.svg><meta name=theme-color content="#2e2e33"><meta name=msapplication-TileColor content="#2e2e33"><link rel=alternate type=application/rss+xml href=https://yanirseroussi.com/til/index.xml><link rel=alternate hreflang=en href=https://yanirseroussi.com/til/><noscript><style>#theme-toggle,.top-link{display:none}</style><style>@media(prefers-color-scheme:dark){:root{--theme:rgb(29, 30, 32);--entry:rgb(46, 46, 51);--primary:rgb(218, 218, 219);--secondary:rgb(155, 156, 157);--tertiary:rgb(65, 66, 68);--content:rgb(196, 196, 197);--code-block-bg:rgb(46, 46, 51);--code-bg:rgb(55, 56, 62);--border:rgb(51, 51, 51)}.list{background:var(--theme)}.list:not(.dark)::-webkit-scrollbar-track{background:0 0}.list:not(.dark)::-webkit-scrollbar-thumb{border-color:var(--theme)}}</style></noscript><meta property="og:title" content="TIL: Today I learned..."><meta property="og:description" content="Short, rough posts about things I learned, inspired by [Simon Willison's TIL](https://til.simonwillison.net/).
 Subscribe to the mailing list to receive a digest of recent TILs whenever I publish [longer-form posts](/posts/).
 {{<subscribe_form>}}
 "><meta property="og:type" content="website"><meta property="og:url" content="https://yanirseroussi.com/til/"><meta name=twitter:card content="summary"><meta name=twitter:title content="TIL: Today I learned..."><meta name=twitter:description content="Short, rough posts about things I learned, inspired by [Simon Willison's TIL](https://til.simonwillison.net/).